Many surveys contain missing values, also known as 'empty cells', meaning a cell in the data set lacks a value. This often arises when a respondent hasn't been posed a specific question.
Depending on the domain, missing values may be referenced as
NULL (software development)
Blanks (Excel)
SYSMIS (SPSS), etc.
There are two ways to make calculations for variables with missing values.
1. Ignore empty cells in calculations
You can calculate metrics without taking into account empty cells. In this case, any % will be calculated only from the respondents who were asked the question, thus ignoring those with missing values.
Here is a scenario using the holiday abroad example:
Total sample: 1000
People who said yes to holiday abroad: 500
People who travelled by air: 300
To calculate people travelled by air (without including the empty cells) the calculation is 300/500*100 = 60%.
2. Include empty cells in calculations
Alternatively, you could calculate metrics based on all cells (including empty cells). In this case, any % will be calculated from all respondents on the survey, regardless of whether they were asked the question.
Here is the same holiday survey example displaying this calculation:
Total sample: 1000
People who said yes to holiday abroad: 500
People who travelled by air: 300
To calculate people who travelled by air (including empty cells) the calculation is 300/1000*100 = 30%
You will see from above, there is a large difference between the figures for making decisions. To improve accuracy it makes sense to calculate the % only from those who were asked the question by excluding empty cells.
How to set up empty cells calculation in DataTile
The way you handle missing values in a survey can significantly affect your analysis, including percentages, significance levels, and indexes. When setting up a survey in DataTile, you can choose how to deal with these empty or missing responses in the project settings, with the default option detailed below.
Ignoring missing values when calculating base is the default strategy in DataTile. This is the most popular approach in Market Research data.
This means that for each calculation, only responses from participants who answered the question are considered. Respondents with missing values (due to not answering the question) are not included in these calculations. This is the standard setup for all variables in the project settings.
If you want to include missing values in calculations, you can untick the option.
You can customize this for a particular variable using the Meta-Editor.
Choose any variable;
Click on the 'Derive value' icon on the right ;
Select either an ‘ignore missing values' option or a 'count missing values’ option for each variable as shown below.
Be aware, that including missing values changes the calculation of percentages, significance, indexes etc.
There is also a functionality called rebasing, that can be applied on the fly when building a report. This allows you to filter any calculations for any column or row through a particular variable e.g. a different audience.