Text Refiner in DataTile
Updated over a week ago

Required system role: Coder

The text refiner is an efficient tool for automatically encoding responses, such as in unaided awareness questions. Beyond eliminating manual encoding, you can also create and store coding dictionaries. These can be applied within a project — for instance, during subsequent tracking waves — and repurposed for different projects, such as ad hoc studies on the same brand.

How to work with the Text Refiner

Important!

If a database needs to be encoded, you should upload your dataset to DataTile as ‘Text refiner', NOT as ‘Database’.

Upload as Text refiner

The uploaded entity appears in ‘My Space’ and now you can open the uploaded file and start working in the Text Refiner Space.

The Text Refiner Space

Once the dataset is opened, locate the variable in the left pane that holds the data for re-coding. Click the downward arrow above it and select 'Encode'. The data from that column will then be presented as a dynamic table on the screen.

Encode variables

The data from the encoded column is segmented into strings based on distinct variations (clusters). All clearly matched variations are grouped into a single cluster. The total cluster count is displayed above the table. The 'count' column indicates the precise occurrence of each cluster within the data.

The search by name and by count is available:

  • Use the search by name function to locate clusters containing the entered keywords.

  • When searching by count, you can employ mathematical symbols (<, <=, >=, >) to filter clusters based on their numbers being above or below a certain value. Inputting a specific number will display only clusters with that exact count.

The square checkboxes on the left allow for cluster selection. To select or deselect all clusters, use the checkbox located above the table.

Split clusters

An advanced feature for enhancing multi-response questions allows you to separate multiple answers and then recombine them after applying the Text Refiner. Utilize this function for responses where multiple answers are provided in a single cell separated by delimiters (e.g., "Apple, Samsung, Honor").

  • Click on SPLIT in the top menu in the Text Refiner Space;

  • Choose a delimiter that separates multi-responses in your file and click OK.

How to split clusters
  • Repete if you have several delimiters.

  • After refining the text, you can download a file with the updated clusters.

Note! All the split clusters will be arranged in respective rows. However, clusters that were previously divided will appear in individual columns, which the system will append to the file.

Replace clusters

If your goal isn't to merge clusters but to modify a word or a few characters, utilize the 'replace' function.

  • Type in the symbols you want to replace in the ‘Search by name’ string.

  • Click REPLACE in the top menu.

  • Type in the replacement in the popup and click OK.

How to replace clusters

Merge clusters by items

The core functionality of the Text Refiner is merging clusters, streamlining their count.

The first approach is to merge clusters by items.

  • Choose the cluster with the correct wording and label it as 'approved' using an 'approve' radio button next to each cluster.

  • All clusters differing only in font case will immediately be merged with the approved one. For example, if you point ‘BMW’ as the approved cluster, such variations as ‘bmw’, ‘Bmw’, ‘BmW’ etc. will be merged to ‘BMW’.

  • Click on the approved cluster to check the list of all merged items on the right pane.

Approve and merge the items

Alternatively, you can merge items manually:

Select all items that need to be merged and click the ‘Merge’ button above the table. By default, the system proposes the first chosen cluster as the primary one, but you have the flexibility to rename it during the merging process.

Manual merging

If some clusters are irrelevant, label them as 'SYSMISS' to exclude them from calculations. All these marked items will be grouped under a single 'SYSMISS' cluster, which you can further edit if needed.

After the merging is completed, you can return to any merged item and revise it. Click on a merged cluster, and the right pane will display all the incorporated clusters. To extract a cluster from the merged group, simply select it and confirm its removal. Utilize the search bar for faster navigation.

Merge by Clusters

The second approach is to merge clusters using the smart DataTile system.

  • Navigate to the 'Clusters' tab located in the top left corner;

  • The data is now presented as cluster sets, grouped based on a text similarity method;

  • Choose a method to view the cluster sets prepared for merging;

We recommend explore different methods to check which one is the best for your dataset.
For now DataTile provides following methods:

  • Levenshtein;

  • Jaro-Winkler;

  • Keyboard switcher (to English);

  • Damerau;

  • Longest Common Subsequence;

  • Russian-English (based on Levenshtain for both languages)

  • Adjust the thresholds to specify the allowable word variance and determine word size for more precise cluster groupings.

  • Review the grouped clusters for accuracy. Remove any irrelevant clusters from the sets as needed. To exclude a cluster, either use the deletion button to its left or simply click on it.

  • Check if the clusters are gathered correctly. Delete irrelevant clusters from the sets if it’s necessary. Use the deletion button on the left of each cluster or just click on it to exclude.

  • Verify and, if necessary, modify the name under which the clusters will be combined.

  • Once satisfied with the clusters in the set and its designated name, click the 'Merge' button adjacent to the set name. This consolidated set will then disappear from the 'Clusters' tab but can be viewed under the 'Items' tab, where you can inspect the merged clusters.

Save & Share Dictionaries

DataTile Text Refiner keeps a record of your cluster merging activities. For instance, if you've merged 7,650 clusters down to 150, you can:

a. Review all merged clusters in the right pane of the 'Items' tab by selecting a cluster.

b. Save this merge history as a dictionary, eliminating the need for similar merging actions in the future.

Click on the red ‘Dictionary’ icon and select 'export to dictionary'. Ensure you provide a descriptive name for the dictionary for effortless retrieval later on.

To leverage a previously created dictionary:

  • Click the red ‘dictionary’ icon.

  • From the dropdown, pick the desired dictionary.

  • Choose 'import from dictionary'. The dictionary will be applied instantly.

Any data not pre-encoded in the dictionary can be encoded as described and subsequently added back to the dictionary. With a few iterations, you can refine a dictionary capable of encoding over 99% of the responses.

Import from dictionary

Stored dictionaries are located in ‘My Space’ as distinct items. From there, you can download, share, or delete them as needed.

Set up a code to a cluster

Although the Text Refiner primarily focuses on editing variables rather than categorizing them, you can assign a code to a cluster. This code will be saved in an SPSS file as a categorical variable defining a specific cluster.

Suppose you want to assign codes to clusters as follows: 'Toyota' gets code 1, 'Mazda' gets code 2, and 'BMW' gets code 3. After merging these clusters, replace the '?' in the code column with the respective numbers '1', '2', or '3' and then save them.

Downloading a refined file

Once the text refining process is complete, you can download the refined dataset in one of four available formats and re-upload it as ‘Database’

  • csv;

  • Excel;

  • Project;

  • sav.

Download a refined dataset

Did this answer your question?