Advanced Z-test with Audience Overlap Detection

This article is part of the DataTile series on Significance Testing.

· Unpooled variance · Overlap-adjusted standard error · Supports weighting

Why Adjust for Audience Overlap

When comparing proportions across groups, traditional Z-tests assume the groups are completely independent — that no respondent appears in both groups. But in real-world cases, this assumption is often violated. Imagine comparing Male vs Total. Since Total includes males, the groups partially overlap, which violates the independence assumption and can distort results:

Group correlation: Shared respondents create dependencies between the groups
Underestimated standard error: The overlap reduces variability, making the difference appear more precise than it is
Inflated statistical significance: The p-value becomes artificially low, increasing the chance of false positives

To ensure valid results, it’s essential to adjust for audience overlap when comparing proportions. DataTile handles this automatically using a modified z-test tailored for overlapping groups.

How the Adjustment Works

In DataTile, the Advanced Test for Audience Overlap is applied only when comparing proportions (e.g., % awareness, % usage). It does not apply to comparisons of means or numeric values.

To ensure accurate and reliable comparisons, DataTile uses a modified z-test specifically designed to handle overlapping audiences. This approach replaces the assumption of full independence with a more realistic structure — one that reflects partial audience overlap and weighting. The advanced formula does the following:

Divides the sample into three non-overlapping subgroups:
- -respondents unique to Group 1,
- -respondents unique to Group 2,
- -overlapping respondents (present in both groups).
Weights each subgroup individually using the sum of squared respondent weights, ensuring accurate estimation even in complex weighted samples
Applies a correction factor to the overlapping portion to adjust for correlation and imbalance between groups.
Combines all three components to compute an adjusted standard error, which reflects the true audience structure and avoids underestimating variability.

This method ensures that the test remains statistically robust, avoiding inflated significance and providing fair comparisons, even in complex segment structures.

Advanced Z-test for overlapping samples: formula

(variance not pooled, standard error is adjusted for overlapping, formula adapted for weighting)

0. Terms an definitions

Let the two groups being compared be:

- first group
- second group

Define the three mutually exclusive subsets:

: Overlap between groups (respondents who belong to both groups)
: Only in the group
: Only in the group

Variables:

: proportion of success in
: base sizes in
: the sum of squared weights in .

If weighting is disabled, or all weights are 1, then the sum of squared weights equals the regular unweighted base size — simply the number of respondents.

1. Calculate the Adjusted Standard Error

Decomposition of Adjusted Standard Error

The formula consists of three parts, each responsible for a different portion of the sample structure:

Standard variance component for overlapping group (X₀).

Adjusts for dependency between groups. If the groups partially overlap, some respondents appear in both, introducing correlation. This term subtracts the overlapping influence to ensure a fair comparison.

Importantly, we don’t simply discard the overlapping data — that would waste valuable information. Instead, this term retains the shared audience but applies a correction factor based on the difference in group sizesand the weighted base structure.

Standard variance component for , adapted for weighted data.

Reflects the contribution of respondents who are only in group 1 (i.e., not shared with group 2).

Standard variance component for , adapted for weighted data.

Reflects the contribution of respondents who are only in group 2 (i.e., not shared with group 1).

2. Calculate the adjusted Z-score

3. Determine Significance

Compare the Z-score to the standard normal distribution to obtain the p-value. If the p-value is below the chosen significance level (typically 0.05), the difference in proportions is considered statistically significant, meaning it is unlikely to have occurred by random chance alone.