Preparing SPSS files: do’s and don’ts
Please, supply a questionnaire together with the data file. It saves our experts time and helps avoid misinterpretations of variables in data.
Key points
UTF-8 is the preferred encoding for labels and open-ended questions. Note that third-party libraries for generation SPSS files may not declare text encoding in the file itself.
Ensure that the default encoding is UTF-8. If another encoding is used, explicitly declare it in the SPSS file before sending it.
Variables’ names must contain:
Latin letters
Numbers [0-9]
Symbols [ _ . @ - $ ]
Variables’ names must not contain:
Any kind of spaces
Letters of non-Latin alphabets like Cyrillic, hieroglyphs, Arab, Hebrew, etc.
Preferably start the name with a letter, not with a number or symbol.
Automatic tracking needs the constancy of the variable's naming. A variable should maintain the same name across different waves. Names like 'gender' (in the 1st wave) and 'sex' (in the 2nd wave) will be considered different variables.
Variable names are not case-sensitive. For example ‘Gender’ and ‘gender’ are considered to be the same variable.
Label categorical variables. Missing labels will be ignored during the loading process to DataTile, and corresponding values will be assessed as SYSMIS. Variables without labels will be read as numerical.
Assign meaningful names to variables that reference the corresponding questions from the questionnaire. It makes the process of correspondence more convenient.
The naming convention for processing automatization is below
We recommend using simple patterns in the names of multi-response questions, for example, Q8_1, Q8_2, Q8_3, etc. If we have an ‘Other’ option as an open question it is preferable to name it like Q8_99 or Q8_999 to simplify the data processing.
Categorical variables corresponding to answers from the ‘Other’ category should be named like that Q8_99_OE where OE is a short form for ‘Open-ended’. In that case, it will be easier to highlight the open part of the ‘Other’ category and not mix them with closed questions.