**A “Choose-your-own-adventure” hypothesis test**

The last three posts focused on revealing our human mistakes in interpreting statistics and providing solutions to overcome those pitfalls. Now, we will focus on tools that you might consider using in every statistical project.

First up is the permutation test, an alternative to the *t*-test for comparing two populations. “An alternative,” you say, “why would we ever need that?” Unfortunately, real-world experiments do not always yield perfect data. Sometimes you only have 15 samples. Sometimes the populations were unevenly sampled. Often there is no guarantee that the underlying distributions are normal with equal variances in the two populations. While the *t*-test is robust, you may be uncomfortable with violating assumptions of the *t-*test. For example, you might have samples from two populations that look like the samples below:

Permutation tests shine here because they make fewer assumptions about your data. Rather than assume any underlying distribution, the first step in a permutation test is to construct a null distribution from the data by shuffling (or “permuting”) the data so that the population labels are scrambled. After all, if the two populations are the same, then the values should be exchangeable and shuffling the labels should be meaningless. Repeating this shuffle a 1000 times or so and calculating the difference in means each time provides a null distribution against which to assess the unusualness of the observed (unpermuted) difference. The proportion of values in the null distribution that are more extreme than the actual difference is the *p*-value of the permutation test. Below, the red lines indicate the observed difference in means for the comparison shown above. A comparison of the two observed means yields a *p*-value of 0.004.

**Basic steps of a permutation test (a.k.a. Randomization Test):**

- Calculate the observed test statistic, for example the difference between the means of Sample 1 versus Sample 2.
- Permute (i.e. shuffle) the sample labels of the observations to simulate a new Sample 1 and Sample 2 from the same data.
- Calculate the same test statistic for the permuted data. This gives one example of what the test statistic would look like if the null hypothesis of no difference between populations were true.
- Repeat steps 2 and 3 a thousand times. With each repetition, you get an additional example of what the test statistic looks like, by chance, when the null hypothesis is true.
- Compare the observed test statistic (step 1) to the distribution of values that describe how things would be if the null hypothesis were true. The proportion of the permuted statistics that is more extreme than the observed value is the
*p*-value for the permutation test.

Comparing populations in other ways can also be useful. Because you define your test statistic as part of the permutation test, you can design practically any statistic you want to compare. Instead of looking at the difference in means, for example, you can calculate the difference in variances for each permutation. As seen below, there is a significant difference in the variances of the two samples we used earlier. You can even look at the difference in *e ^{x}*, where

*x*is the measured data, if you wanted. The choices are endless and the same procedure applies. Simply substitute in your custom metric wherever you would calculate a test statistic.

Of course, permutation tests are not without limitations. The nature of shuffling labels in the data set to form a null distribution assumes that there is no structure in your data – for example, no correlations or grouping among samples that would be lost when permuting. Additionally, permutation tests only provide *p*-values. As we have discussed in a previous installment, confidence intervals and effect sizes are also important metrics for judging the statistical and practical significance of observed patterns. These metrics can also be captured using simulation approaches.

Nevertheless, when conditions are right, permutation tests are a powerful tool for hypothesis testing that circumvents some of the assumptions of parametric tests and allows increased flexibility in making insightful comparisons between populations. Here is the R code that generated all the graphics and tests in this article. You can see how a permutation test compares to a *t*-test and try creating test statistics other than the mean. Feel free to experiment!

Thanks again for reading, and hope you will join us in a few weeks for the final post of this series! It will cover power analyses, a simulation-based tool that can aid in study design and provide insight into the strength of your statistical tests. Stay tuned!

**Sources**

Ong, D. C. (2014) “A primer to bootstrapping; and an overview of doBootstrap.” URL: https://web.stanford.edu/class/psych252/tutorials/doBootstrapPrimer.pdf

**Past Articles in the Series**

*I am working with E. Ashley Steel and Rhonda Mazza at the PNW Research Station to write short articles on how we can improve the way we think about statistics. Consequently, I am posting a series of five blogs that **explores statistical thinking, provides methods to train intuition, and instills a healthy dose of skepticism. Subscribe to this blog or follow me @ChenWillMath to know when the next one comes out!*

*Ideas in this series are based on material from the course, “So You Think You Can Do Statistics?” taught by Dr. Peter Guttorp, Statistics, University of Washington with support from Dr. Ashley Steel, PNW Station Statistician and Quantitative Ecologist, and Dr. Martin Liermann, statistician and quantitative ecologist at NOAA’s Northwest Science Center.*