Best practices for structuring data for trend analysis

General Recommendations

SPSS is preferable

DataTile accepts a variety of input formats, such as Excel and CSV. However, these text-based formats can be prone to errors, especially when used for tracking surveys and syndicated studies.

SPSS file format for the best experience. Using SPSS files is recommended for the seamless uploading of new waves into a tracking database. DataTile automatically matches and merges new data into existing databases and actualizes all deliverables like dashboards and reports.

You can maintain tracking databases by uploading files manually via the interface or pushing them via API.

Create a database from the first wave of the survey.
Set up deliverables - tidy up the codebook, create dashboards, and configure reporting.
Add new waves to the database or completely reload it if your data processing team provides all interviews in one file.

Restrictions on variable names

DataTile fully supports all types of characters and symbols in variable names. However, we recommend restricting characters to ensure compatibility with other solutions and locales that might be used alongside DataTile. Following these guidelines facilitates seamless data flow across suppliers and systems internationally.

To ensure compatibility, we advise restricting variable names in DataTile to the following:

Use the Latin alphabet, numbers, and universally supported symbols (# _ . @).
Avoid spaces, special characters, umlauts, accented symbols, and brackets.
Avoid incorporating SPSS reserved words ALL, AND, BY, EQ, GE, GT, LE, LT, NE, NOT, OR, TO, WITH
Ensure that variable names begin with letters from the Latin alphabet.
Limit variable name length to 64 characters.

These recommendations can enhance compatibility and streamline data integration across different tools and platforms when using DataTile.

Variable and category labels

In DataTile, there are limitations on the length of labels as follows:

Variable labels are truncated at 1024 characters.
Category labels are truncated at 512 characters.

It is common practice to incorporate category labels of dichotomous multi-response sets into variable labels using the format: {{QUESTION_TEXT}} - {{CATEGORY_LABEL}}.

If variable labels exceed 1024 characters, it is recommended to trim the common part of the question text rather than the unique part that identifies the category. Doing so lets you retain the crucial information that helps identify the category.

DataTile has the capability to derive category labels when creating multi-response sets automatically. Therefore, keeping category labels intact can save significant time and prevent issues with label management after loading the data.

By following this approach, you can effectively manage label length limitations while preserving essential information and simplifying the labeling process in DataTile.

SPSS File Encoding

To ensure seamless loading, provide SPSS files saved in the Unicode mode or with explicitly defined Locale.

How to change the encoding of your SPSS file

To change the encoding of your file in SPSS, follow these steps. Please note that the screens may vary depending on the version of SPSS you are using:

Before opening the dataset you wish to examine click ‘Edit → Options’ in the menu in SPSS.
A window displaying various options is opened now. Switch to the ‘Language’ tab.
1. Choose ‘Unicode (universal encoding)’ to set UTF-8 as the default encoding for data and syntax in SPSS.
2. Click ‘OK’ to apply the changes.
To verify if the encoding change has been applied, look at the bottom right corner of the SPSS program window. It should display ‘Unicode: ON’.
If you see the text ‘Unicode: OFF’ inside a red circle, it indicates that the ‘Character Encoding for Data and Syntax’ field in the ‘Language’ tab is set to ‘Locale's writing systems’ instead of ‘Unicode (universal encoding)’.
To change the SPSS encoding to UTF-8:
1. If you open a file that is not encoded as Unicode in SPSS, a pop-up window will appear with the following message once you enable Unicode:
2. Choose ‘Yes’ to optimize the number of bytes. The file will now be saved in the UTF-8 format.

Labels aren’t traced

Variable and value labels do not participate in data tracking and matching. Labels in DataTile, once created in a database, prevail over those provided in the subsequent uploads.

You can edit existing labels directly in DataTile, update them in bulk by importing an Excel codebook, or instruct DataTile to override labels when replacing data in the database (see below).

Wave variables and datasets

DataTile maintains the registry of files imported into its database by generating a virtual variable, $META_VOLUME. The categories within this variable represent the files loaded into the database, and each interview is linked to its respective category.

Like any standard variable, you can modify the labels of the $META_VOLUME variable and its associated categories. DataTile automatically removes the related interviews and the corresponding category from the $META_VOLUME variable if a data set is deleted from the database.

Though it’s possible to use the $META_VOLUME variable as a wave for trending, we recommend explicitly embedding wave variables into your data sets to ensure control and stability over trending reports.

Loading Data into DataTile Database

Matching Variables and Categories

When importing new data files into an existing database, DataTile strives to align the variables from the uploaded file with those already present in this database, effectively integrating everything into a single consolidated database.

Consider it like a data processing task where you're merging two SPSS files into a single dataset. The resultant dataset would comprise a union of all unique variables and the categories within these variables.

To enhance the robustness of data loading operations, DataTile uses case-insensitive variable matching. For example, variable names like tom, Tom, and TOM are all treated as identical.

The examples below demonstrate typical situations, with the presumption that the tracking survey includes a "Top of mind" [tom] question (e.g., "What is the first brand that comes to mind when you think about soft drinks?").

The table on the left represents the data structure in the DataTile database, while the right table depicts the structure of the corresponding variable from the SPSS file that we're integrating into the database.

The examples below employ DataTile Codebook notation to depict the data layout of a dataset. CAT in the first column indicates the variable type, which, in this case, is Categorical.

Case 1.
Identical variables. Nothing special is happening. DataTile concatenates the tom column vertically. Value labels aren’t affected.

Case 2.

AquaZen is coded as ‘1' in the first wave, but in the new SPSS file '1' stands for 'Jumbo Cola’. DataTile will ignore inconsistency in labels and amend everything as AquaZen.

This would be a severe error in almost all cases.

Case 3.

NectarNation [4] was asked about in the first wave but omitted in the second. Alpenkrone [6] was added to the questionnaire in the second wave.

The resulting database after the merge will contain all 6 brands in the ‘tom’ variable.

Case 4.

In case of a spelling mistake, DataTile won't be able to find a match for the 'tom' variable in the second wave. As a result, the final database will include both 'tom' and 'topm' variables.

The 'tom' variable will have missing values for the second wave, while 'topm' won't have any values for the first wave. Although this error causes inconvenience, it's not as significant as in Case 2, because it's easily identifiable. The solution is to rename it and reload the second wave.

Variable Type Matching

DataTile determines the type of variables upon the creation of a database. Change of a variable type only be made through a complete database reload. So it is best to keep the types of variables consistent across all waves.

DataTile classifies a variable as categorical when it has associated value labels in the SPSS file. Keep in mind that only those categories with defined labels will be generated in the DataTile database. Values without specified labels will be treated as SYSMIS and will not be incorporated.

Naming Conventions

While it's feasible to map variables and values from SPSS files to different ones in the DataTile database (e.g., mapping "sex" to "gen", and "regions" to "geo"), maintaining consistency in names and category values is highly recommended.

The same logic should be applied to brand lists and category codes that are supposed to be traced across waves. Keep tables of categories consistent for all waves.

Establish and manage a brand code table, expanding it with new codes whenever a new brand is added to the survey. Even if a brand is removed from monitoring, it's advised not to delete its corresponding code from the table to prevent false matches due to code reutilization.

This guideline applies not just to brands, but also to all categorical lists that you aim to track over time.

The naming convention for multi-response sets

DataTile can automatically manage multi-response sets based on naming patterns when new variables are added to the database.

Ensuring unique naming conventions for variables that comprise the same multi-response set is crucial for automation and human readability.

As a suggestion, use a common prefix for the names of variables within the same multi-response set, and append the brand code as a suffix.

For example, using a pattern like PA_1, PA_2, PA_3, ... is a clear and effective naming convention for the "Prompted Awareness" question. While any naming pattern can be used, applying it consistently within its specific multi-response set is essential.

An alternative pattern like QS12_B1, QS12_B2, QS12_B3, ... may work equally well for DataTile, but could be less intuitive for human interpretation.

Labeling

When it comes to labeling dichotomous variables, DataTile expects the labels of Multi-Response (MR) set variables to have a common part, typically the question, and a unique part, which is considered the categories' labels. Use a hyphen to separate question text from the category label for better processing. Question text - Category
For example:

Which of the following soft drink brands have you heard of? - AquaZen
Which of the following soft drink brands have you heard of? - Wellspring

The naming convention for matrix scales

Here’s an example. Let’s assume we ask respondents to rate each statement for each brand.

Q23. How would you rate your overall satisfaction with your stay at our hotel?

So we have statements and brands (which should be consistently coded over the waves, as previously outlined).

Responses to these questions will be recorded in N = [No of brands] x [No of statements] variables. We advise defining labels for the values of these variables.

Here's a proposed naming convention for this set of questions:

Matrix for Brand 1: Q23_B1_S1, Q23_B1_S2, Q23_B1_S3, Q23_B1_S4, Q23_B1_S5
Matrix for Brand 2: Q23_B2_S1, Q23_B2_S2, Q23_B2_S3, Q23_B2_S4, Q23_B2_S5
and so forth.

In this scenario, we iterate across two lists (brands and statements), forming a consistent pattern that makes it easy to associate each variable with a respondent's choice.

Labeling

When labeling the variables, employ a pattern akin to the one used for naming them. For example, if you have variables for each brand and statement, construct labels similarly, such as Brand - Statement. This way, DataTile can accurately assign the appropriate labels for the entities.

The naming convention for the question matrix

This case is akin to the previous one, but we measure Attitudes, Statements, Brand Image, or Attribution instead of scale.

This data will be captured as a set of dichotomous variables. As before, we recommend using a pattern to match each variable to its corresponding statement and brand by name.

You might have a set of such tables for each product in more complex scenarios. For instance, when questioning about attitudes towards products manufactured by the brands.

In such cases, a three-tier naming pattern should be employed for your questions to iterate over brands, products, and statements.

Here's a potential pattern for such a situation: QS24_BR#_P#_S#, where # is the code of the respective category.

The number of variables to capture responses would be
N = [No of brands] * [No of products] * [No of statements].

Labeling

When labeling the variables, employ a pattern akin to the one used for naming them.

For example, if you have variables for each brand and statement, construct labels similarly, such as Brand - Statement.

Similarly, if you have three-tier questions, construct labels using the three-tier Brand - Product - Statement pattern.

By aligning the labels with the variable naming convention, DataTile can effectively assign the appropriate labels to the corresponding entities.

Variable types in DataTile

Preparing SPSS files: Do’s and Don’ts

Getting started with DataTile codebook

Uploading multiple waves for tracking studies

Beginners Guide: load and tidy a survey