If you’re doing a statistical study, you usually:

have one or more samples
each sample has a certain number of observations
each sample also has a certain variance (measured via std or var)
you are seeing if there are “differences” between the means of the samples

Let’s make things a bit more concrete, and consider the concrete example of comparing the means of two groups using a t-test.

The t-test

The very first thing you do, in order to find out if the means of these two samples is different, may be a t-test. The t-test will tell you two pieces of information:

the t-statistic - basically how different the means are in terms of the standard error of differences (basically meaning that if you repeatedly resample, and you calculate differences, you will get a distribution, the variance of this distribution is the standard error)
the p-value - where you lie on the above mentioned distribution

If you get a really large p-value, it means the difference observed (whatever the t-statistic is) lies really close to the center of the above mentioned distribution (the distribution of differences).

Now, you should ask yourself, did I get this large p value simply because my sample size is too small (relative to the variance of the samples)?

You can test this theory by doing power analysis

The Power Analysis

We know that each time you take 2 samples from the population, they will be a little different (depending on the variances).

So, sometimes you’ll see a smaller and sometimes a bigger difference between the means of the two samples.

Power analysis can tell you what % of the time you’ll see a difference bigger than a certain threshold.

You tell the power analysis:

your sample sizes
your sample variances
the smallness of the difference you want to detect (the threshold mentioned above)
the significance level you want to detect at (usually 0.05)

The power analysis will tell you what % of the time you’ll detect that difference. So if its 80%, it means that if you indeed repeate the experiment 100 times, 80 of those times you’ll see the difference you set.

Generally:

the larger the sample size, the higher the power
the larger the difference you want to detect, the higher the power
the smaller the variance, the higher the power
the higher the significance level, the higher the power

Doing Power Analysis in Python

The power analysis calculation is pretty complex, but you can use python’s statsmodels library to do it.

Here’s the code (we’ll talk about it later!):

import statsmodels.stats.power as smp

# params
effect_size = 0.2 # small effect
alpha = 0.05 # significance level
nobs1 = 100 # sample size 1
nobs2 = 30 # sample size 2

# perform power analysis
power = smp.tt_ind_solve_power(effect_size=effect_size, nobs1=nobs1, alpha=alpha, ratio=nobs2/nobs1, alternative='two-sided')

I think all of that will be straight forward except the effect_size. This is the smallness of the difference you want to detect, in terms of the standard deviation of the samples. So, in other words, its the difference you want to detect divided by the standard deviation. Generally:

0.2 is a small effect
0.5 is a medium effect
0.8 is a large effect

But, obviously, this depends on your field/context! You just have to know what it means. It means the difference in terms of standard deviations! Use whatever value makes sense and is acceptable in your situation.

How to Approach

Before doing a statistical experiment/analysis, you should consider doing a power test to see if you have enough power to detect a difference once you sample. If you don’t have enough power, you may need to increase your sample size or change your test (some are more powerful than others).

Alternatively, after doing a statistical test, if you notice you have a large p-value, you may want to do a power analysis to see if you simply didn’t have enough power to detect the difference you were looking for.

if you get a low power, then you didn’t have enough samples to detect the difference
if you get a high power, then you can be more confident that the difference you saw was indeed not there

How to Calculate Power for Permutation Tests

You want to simulate the permutation test repeatedly, and see how many times you get a difference greater than the one you observed. This is the power of the test.

Here’s how to do it in Python:

import numpy as np
from scipy.stats import permutation_test

def calculate_power(data1, data2, num_permutations=1000, alpha=0.05, num_simulations=1000):
    significant_results = 0

    for _ in range(num_simulations):
        # Generate new samples under the alternative hypothesis
        sample1 = np.random.choice(data1, size=len(data1), replace=True)
        sample2 = np.random.choice(data2, size=len(data2), replace=True)
        
        # Perform permutation test
        result = permutation_test((sample1, sample2), statistic=lambda x, y: np.mean(x) - np.mean(y), 
                                  permutation_type='independent', n_resamples=num_permutations, alternative='two-sided')
        
        # Check if the p-value is less than alpha
        if result.pvalue < alpha:
            significant_results += 1

    # Calculate power
    power = significant_results / num_simulations
    return power

# Example usage
data1 = np.random.normal(0, 1, 50)
data2 = np.random.normal(0.5, 1, 50)
power = calculate_power(data1, data2)
print(f'Estimated Power: {power:.4f}')