Auto tokenizer and text variable settings

The advanced auto tokenizer feature enhances working with text variables and Word clouds in Data Tile. It activates the text variable settings and splits the text content into individual words to display in the Word Cloud chart.

Now, enabled auto tokenizer is the default setting in DataTile. With auto tokenizer, DataTile treats text blocks within a text variable as individual words, and the Word Cloud chart will be built on their basis. Also, the text variables settings will be enabled as well.

Text variable settings

In Meta-Editor, select the text variable from the database and click on the ‘cog’ icon next to it.

Within the popup window, you can specify the minimum word length, thus removing too-short words, usually articles and prepositions, from the Word Cloud.

By entering certain words in the ‘Stop words’ box, you can also exclude other words that cannot be length-filtered.

So if you need to use separate words, hashtags, and categories for your analysis, you don't need to encode them separately in the dataset. It also allows you to use terms that are part of longer text blocks - reviews, opinions, and comments - for building the Word Cloud.

How to disable the auto tokenizer

If your research does not involve dividing text variables into words, and you want to use whole expressions for Word Cloud, you need to switch the auto tokenizer off.

Go to Database settings → Settings → Feature availability and untick the checkbox ‘Auto tokenizer for Word Cloud’.

Database settings overview

Variable types in DataTile

Best practices for structuring data for trend analysis

Word Cloud chart

Digest 2024 October