Getting started with DataTile codebook

What is a codebook for?

Maintaining consistent coding in the questionnaires is crucial for tracking studies that span multiple years, encompass various geographic regions, and involve different research agencies or fieldwork suppliers.

To achieve this, a clear, concise, and intuitive data map (a.k.a. codebook) is paramount. The structure and notations used in the data map should enable seamless navigation between the codebook, the questionnaire, and the raw data.

The codebook is pivotal as the central point of truth regarding the questionnaire for researchers, data engineers, and analysts. It guarantees consistent question naming and mitigates the risk of category confusion across different study waves, markets, and fieldwork suppliers. Data management and analysis processes become more efficient and dependable by establishing and adhering to a standardized coding framework. The codebook acts as a reliable reference for ensuring consistency and accuracy in data processing and analysis tasks.

At DataTile, drawing from our extensive international experience, we have formulated best practices and devised a comprehensive approach for designing a data map. By following our recommended practices, you can create a codebook that meets the aforementioned challenges and provides a robust foundation for data management and analysis.

The DataTile team is committed to developing this guideline into a comprehensive specification. Additionally, we are actively working on creating open-source tools and instruments for data processing to advance the market research industry. Our goal is to contribute to developing accessible and innovative solutions that benefit researchers and enhance data management practices in the field.

What is DataTile codebook?

DataTile automatically derives the schema from loaded data. It detects variable types from the content, fetches value labels for categorical variables, etc.

However, it's important to note that automatic detection may not always be perfect, especially when dealing with poorly labeled SPSS data or files in textual formats like Excel or CSV.

In such cases, a codebook or data map becomes essential. It contains additional metadata that amends DataTile’s automatic processing of loaded data. With a codebook, you can define variable types, labels, and categories, create folder hierarchy, transform data and so on.

Codebook is also an instrument of validation and control which is essential for complicated tracking studies.

Here is a simple example of how a codebook works. We take a dummy project with the following materials:

The questionnaire contains the main types of questions and data structures in surveys.
The codebook on the second tab specifies the corresponding data structure and variable types.
The third tab of the document comprises a dataset containing 100 simulated interviews conducted based on the questionnaire.

You can look at the Google Sheet with the dummy data directly via this link.

How to Load Data with Codebook

It is as easy as this:

Pack the data file (SPSS, XLS, XLSX, CSV, other) together with the codebook in a ZIP archive. The ZIP bundle should contain the data and the codebook files in its root and nothing else.
The codebook file should be in Excel format and have the secondary extension *.dtcb.xlsx. The secondary extension dtcb stands for "DataTile codebook" and allows DataTile to distinguish data and the codebook.
You can use this ZIP bundle as a sole data file and create, reload or append a database in DataTile by loading it via interface or API.

Variable Type in DataTile Codebook

TYPE	Data nature	Description
Type of variables
CAT	Categorical	Categorical variable usually represented by a single column in a raw data set
NUM	Numeric	Numerical variable. For example, Age
TXT	Text	Open-ended questions or textual data
DATE	Date or Timestamp	Date/Time. When parsing textual formats the following template is assumed: yyyy[-mm[-dd[ hh:ss]]]
WT	Weight	Factor assigned to each respondent to correct for biases that may occur due to unequal probability of selection.
PROB	Probability	Numeric variable where values represent personal probabilities for a respondent to be reached media. Employed in media planning.
Structures
DIC	Dictionary	Reusable template of a categorical variable
MD	Dichotomous multi-response	Represents question where respondent may select multiple options. Each option is represented in data by a dichotomous variable.
MC	Categorical multi-response	Represents question where respondent may select multiple options. Each answer is represented in data by a categorical variable.
ARR	Array or Matrix question	Usually displayed as a table in a questionnaire. E.g. Brand Image or Attribution questions. ARR declaration helps to quickly and clearly build multi-level categories in the codebook

Reserved Column Names

You can virtually keep any types of columns in a codebook for your own needs or mapping and translation except the ones that are mentioned in this documentation and the following reserved names:

_VAR_
_TYPE_

Dichotomous Multi-response Set

Variable types in DataTile

Preparing SPSS files: Do’s and Don’ts

Upload your first Database

Best practices for structuring data for trend analysis

What is a codebook for?

What is DataTile codebook?

How to Load Data with Codebook

Variable Type in DataTile Codebook

Type of variables

Structures

Reserved Column Names