Advanced Z-test with Audience Overlap Detection
This article is part of the DataTile series on Significance Testing.
· Unpooled variance · Overlap-adjusted standard error · Supports weighting
Why Adjust for Audience Overlap
When comparing proportions across groups, traditional Z-tests assume the groups are completely independent — that no respondent appears in both groups. But in real-world cases, this assumption is often violated. Imagine comparing Male vs Total. Since Total includes males, the groups partially overlap, which violates the independence assumption and can distort results:
Group correlation: Shared respondents create dependencies between the groups
Underestimated standard error: The overlap reduces variability, making the difference appear more precise than it is
Inflated statistical significance: The p-value becomes artificially low, increasing the chance of false positives
To ensure valid results, it’s essential to adjust for audience overlap when comparing proportions. DataTile handles this automatically using a modified z-test tailored for overlapping groups.
How the Adjustment Works
In DataTile, the Advanced Test for Audience Overlap is applied only when comparing proportions (e.g., % awareness, % usage). It does not apply to comparisons of means or numeric values.
To ensure accurate and reliable comparisons, DataTile uses a modified z-test specifically designed to handle overlapping audiences. This approach replaces the assumption of full independence with a more realistic structure — one that reflects partial audience overlap and weighting. The advanced formula does the following:
Divides the sample into three non-overlapping subgroups:
-respondents unique to Group 1,
-respondents unique to Group 2,
-overlapping respondents (present in both groups).
Weights each subgroup individually using the sum of squared respondent weights, ensuring accurate estimation even in complex weighted samples
Applies a correction factor to the overlapping portion to adjust for correlation and imbalance between groups.
Combines all three components to compute an adjusted standard error, which reflects the true audience structure and avoids underestimating variability.
This method ensures that the test remains statistically robust, avoiding inflated significance and providing fair comparisons, even in complex segment structures.
Advanced Z-test for overlapping samples: formula
(variance not pooled, standard error is adjusted for overlapping, formula adapted for weighting)
0. Terms an definitions
Let the two groups being compared be:
- first group
- second group
Define the three mutually exclusive subsets:
: Overlap between groups (respondents who belong to both groups)
: Only in the group
: Only in the group
Variables:
: proportion of success in
: base sizes in
: the sum of squared weights in
.
If weighting is disabled, or all weights are 1, then the sum of squared weights equals the regular unweighted base size — simply the number of respondents.
1. Calculate the Adjusted Standard Error
2. Calculate the adjusted Z-score
3. Determine Significance
Compare the Z-score to the standard normal distribution to obtain the p-value. If the p-value is below the chosen significance level (typically 0.05), the difference in proportions is considered statistically significant, meaning it is unlikely to have occurred by random chance alone.