Preparing SPSS files: do’s and don’ts
When providing data to the DataTile service team, please always supply the questionnaire with the data file. This helps our experts interpret variables correctly and saves time.
Naming conventions
File Encoding
Preferred encoding: UTF-8 for all labels and open-ended text.
Some third-party tools may not declare encoding explicitly. In such cases, UTF-8 will be assumed.
If you use another encoding, ensure it is declared in the SPSS file so DataTile can read it correctly.
Refer to the list of supported encodings if needed.
Variable Naming Rules
Must start with a Latin letter.
May include:
Latin letters (A–Z, a–z)
Numbers (0–9)
Symbols:
_ . @ - $
Must not contain:
Spaces
Non-Latin characters (e.g., Cyrillic, Chinese, Arabic, Hebrew)
Names are not case-sensitive, so DataTile reads
"Gender"
as"gender"
regardless of case.Prefer meaningful names linked to the questionnaire for easier cross-reference.
Labels
Variable labels - if a label is not provided, DataTile will use the variable name as a label.
Value labels - explicitly declare value labels for categorical variables to ensure DataTile recognises them correctly. DataTile ignores undeclared categories, treating them as missing values (SYSMIS).
Multi-Response and Matrix Questions
Use simple, consistent patterns such as
Q8_1
,Q8_2
,Q8_3
, etc.Use
_99
or_999
for “Other” options (e.g.,Q8_99
).Append names of open-ended “Other” responses with
_OE
or_T
prefix to indicate semantically its relevance to the set of categorical variables. For example, the textual variableQ8_99_OE
is clearly related to the set of categorical variablesQ8_*
.
Upholding Automated Matching for Tracking Studies
DataTile automatically matches and tracks data across waves or benchmarks, assuming variable names and category values are consistent.
Keep Variable Names Consistent across waves and datasets you are intending to conflate in analysis or reporting. For example, use
gender
in every wave, notsex
in one andgender
in another.
DataTile has methods to integrate inconsistent datasets by providing an explicit map for variables and labels. For more information, contact our service team.
Do not reuse variable names for new questions, even if the old question has been removed from the survey. It may result in erroneous trending.
Do not reuse category values - if a value (e.g.,
1=Tomahawk
) was used in a previous wave, don’t assign1
to a new category (e.g.,1=Banana
).
Using consistent, recognizable naming patterns for related variables enables DataTile’s automated processing to align your data correctly and efficiently.