Standard T-test in DataTile

This article is part of the DataTile series on Significance Testing.

· For means · Independent groups · Unpooled variance
· Appropriate for large samples (30+)

As outlined in the “Before You Test” article, all statistical tests in DataTile are designed for independent samples — we never compare paired or dependent observations.

To compare numeric values (e.g., average spend or satisfaction scores), DataTile applies a Z-approximation of Welch’s t-test, assuming unequal variances between groups (unpooled). This method is appropriate for large samples and avoids risky assumptions about equal spread.

For clarity and transparency, we also include the formulas for the pooled variance version of the test, but this variant is not used in DataTile and is shown here only for educational comparison.

Step-by-Step: T-Test Algorithm

1. Calculate the Sample Means

Let the two groups being compared be:

- first group
- second group

Calculate sample means as:

- sample mean of
- sample mean of

2. Calculate the unpooled sample variances

are the degrees of freedom for

3. Calculate the pooled sample variances

In the pooled approach only - not used in DataTile

4. Calculate the Standard Error

Pooled variance - not used in DataTile

Unpooled variance - used in DataTile

5. Compute the T-Score

Pooled variance - not used in DataTile

Unpooled variance - used in DataTile

6. Z-approximation of the t-test

In classical statistics, the t-test uses the t-distribution, which incorporates degrees of freedom to account for sample size and variability. In many practical cases, especially in marketing research with large samples, the t-test is approximated using a Z-formula, without computing degrees of freedom explicitly. This method uses the same formula structure as the t-test, but interprets the result using the standard normal curve.

The theoretical foundation for this approach is the central limit theorem: as the sample size increases (and thus the degrees of freedom), the t-distribution gradually approaches the standard normal (Z) distribution

The chart below illustrates this convergence:

With df = 1, the t-distribution has heavy tails and a lower peak.
As degrees of freedom increase (e.g. df = 5, 10), the distribution narrows and becomes more symmetric.
By df = 30 or more, the t-distribution is nearly indistinguishable from the standard normal.

In DataTile, we apply a Z-approximation of Welch’s t-test, based on the assumption that with 30+ observations per group, the t-distribution closely aligns with the standard normal.

7. Determine Significance

Compare the T-score to the standard normal distribution to obtain the p-value. If the p-value is below the chosen significance level (typically 0.05), the difference in proportions is considered statistically significant, meaning it is unlikely to have occurred by random chance alone.