All Collections
Data Loading
Getting Started with DataTile Codebook
Getting Started with DataTile Codebook
Updated over a week ago

What is a Codebook for?

Maintaining consistent coding in the questionnaires is crucial for tracking studies that span multiple years, encompass various geographic regions, and involve different research agencies or fieldwork suppliers.

To achieve this, a clear, concise, and intuitive data map (a.k.a. codebook) is paramount. The structure and notations used in the data map should enable seamless navigation between the codebook, the questionnaire, and the raw data.

The codebook is pivotal as the central point of truth regarding the questionnaire for researchers, data engineers, and analysts. It guarantees consistent question naming and mitigates the risk of category confusion across different study waves, markets, and fieldwork suppliers. Data management and analysis processes become more efficient and dependable by establishing and adhering to a standardized coding framework. The codebook acts as a reliable reference for ensuring consistency and accuracy in data processing and analysis tasks.

At DataTile, drawing from our extensive international experience, we have formulated best practices and devised a comprehensive approach for designing a data map. By following our recommended practices, you can create a codebook that meets the aforementioned challenges and provides a robust foundation for data management and analysis.

The DataTile team is committed to developing this guideline into a comprehensive specification. Additionally, we are actively working on creating open-source tools and instruments for data processing to advance the market research industry. Our goal is to contribute to developing accessible and innovative solutions that benefit researchers and enhance data management practices in the field.

What is DataTile Codebook?

DataTile automatically derives the schema from loaded data. It detects variable types from the content, fetches value labels for categorical variables, etc.

However, it's important to note that automatic detection may not always be perfect, especially when dealing with poorly labeled SPSS data or files in textual formats like Excel or CSV.

In such cases, a codebook or data map becomes essential. It contains additional metadata that amends DataTile’s automatic processing of loaded data. With a codebook, you can define variable types, labels, and categories, create folder hierarchy, transform data and so on.

Codebook is also an instrument of validation and control which is essential for complicated tracking studies.

Here is a simple example of how a codebook works. We take a dummy project with the following materials:

  • The questionnaire contains the main types of questions and data structures in surveys.

  • The codebook on the second tab specifies the corresponding data structure and variable types.

  • The third tab of the document comprises a dataset containing 100 simulated interviews conducted based on the questionnaire.

You can look at the Google Sheet with the dummy data directly via this link.

How to Load Data with Codebook

It is as easy as this:

  1. Pack the data file (SPSS, XLS, XLSX, CSV, other) together with the codebook in a ZIP archive. The ZIP bundle should contain the data and the codebook files in its root and nothing else.

  2. The codebook file should be in Excel format and have the secondary extension *.dtcb.xlsx. The secondary extension dtcb stands for "DataTile codebook" and allows DataTile to distinguish data and the codebook.

  3. You can use this ZIP bundle as a sole data file and create, reload or append a database in DataTile by loading it via interface or API.

Variable Type in DataTile Codebook


Data nature


Type of variables



Categorical variable usually represented by a single column in a raw data set



Numerical variable. For example, Age



Open-ended questions or textual data


Date or Timestamp

Date/Time. When parsing textual formats the following template is assumed: yyyy[-mm[-dd[ hh:ss]]]



Factor assigned to each respondent to correct for biases that may occur due to unequal probability of selection.



Numeric variable where values represent personal probabilities for a respondent to be reached media. Employed in media planning.




Reusable template of a categorical variable



Represents question where respondent may select multiple options.
Each option is represented in data by a dichotomous variable.



Represents question where respondent may select multiple options.
Each answer is represented in data by a categorical variable.


Array or Matrix question

Usually displayed as a table in a questionnaire.
E.g. Brand Image or Attribution questions.

Did this answer your question?