Category Archives: Communication

Patterns from Noise

What does the p-value really tell us?

Welcome back! If you missed the previous installment, you can find it here.

Continuing the series, we’ll be talking about the p-word. That’s right, “p-values”. A concept so central to statistics, yet one of the most often misunderstood.

Not too long ago, the Journal of Basic and Applied Psychology straight up banned p-values from appearing in their articles. This and other controversies about the use and interpretation of p-values led the American Statistical Association (ASA) to voice their thoughts on p-values; writing such recommendations for the fundamental use of statistics was unprecedented for the organization.

Part of the confusion stems from the complacency with which we teach p-values, leading to blind applications of p-values as the litmus test for significant findings.

Q: Why do so many colleges and grad schools teach p = 0.05?
A: Because that’s still what the scientific community and journal editors use.

Q: Why do so many people still use p = 0.05?
A: Because that’s what they were taught in college or grad school.

– George Cobb

Snide comments aside, let us unpack what a p-value does and does not tell us. First, take a look at the following twenty sets of randomly generated data:PatternFromNoise.png

Each one of the boxes contains 50 points whose x-y coordinates were randomly generated from a normal distribution with mean 0 and variance 1. Yet, we see that there is occasionally a set of points that appears to have a trend, such as the one highlighted in red, which turns out to exhibit a correlation of 0.45. If even random noise can display patterns, how do we discern when we have a real mechanism influencing some response versus simply random data? P-values provide this support by giving us a measure of how “weird” an observed pattern is, given a proposal of how the world works.

More formally, the definition of a p-value is “the probability under a specified statistical model that a statistical summary of the data would be equal to or more extreme than its observed value” (taken from the ASA). Note that this says nothing about the real world. Rather, it measures how much doubt we have about one particular statistical view of the world. If our null hypothesis were true and our model of the world pretty accurate, a “statistically significant p-value”, means that something unlikely has happened (where unlikely could be defined as a 1 in 20 chance). So unlikely that it throws significant doubt into whether that null hypothesis is a very good model of the world after all. It is important to note, however, that this does not mean that your alternative hypothesis is true.

Conversely, an insignificant p-value is not an indication that your null hypothesis is true. Rather, it suggests a lack of evidence as to whether your null hypothesis is an inaccurate model of the world. The null hypothesis may well be accurate or you may simply not have collected enough evidence to throw significant doubt on an inaccurate null hypothesis. A common trap is to argue for a practical effect because of some perceived pattern even though the p-value is insignificant. Resist this temptation, as the insignificant p-value indicates that the pattern is not particularly unusual even under the null hypothesis.  Also resist the temptation to state or even imply that the insignificant p-value indicates (a) there is no effect; (b) there is no difference; or (c) the two populations are the same. Absence of evidence is not evidence of absence.

Ultimately, the p-value is only one aspect of statistical analyses, which is, in turn, only one step in the life-cycle of science. P-values only describe how likely it might be to get data like yours if the null hypothesis were really how the world worked.

There are, however, some practices that can supplement p-values:

  1. Graph the data. For example, how different do two groups look when you make box plots of their responses? How much data do you really have? Large sample sizes can help elucidate significant differences (a topic we will dive into more in a later installment about statistical power). Are there unusual observations?
  2. More formally, estimate the size of the effect that you are seeing (e.g. via a confidence interval). Is it a potentially large effect that is not significant or a very small effect that is statistically significant? Is the effect size you see relevant to potential real-world decisions? A 95% confidence interval of [0.01, 0.05] may be significantly different from zero, but if that interval represents say the increase in °C of river temperature after a wildfire, is it a relevant difference to whatever decision is at hand?
  3. Conduct multiple studies testing the same hypothesis. Real world data is noisy. Each additional study allows you to update prior information and possibly provide more conclusive support for or against a hypothesis. This is, in fact, the basic idea behind Bayesian statistics, which we do not have the space to cover here, but go here for an introduction on the topic.
  4. Use alternative metrics to corroborate your p-values, such as likelihood ratios or Bayes factors

Hopefully, we have provided significant enlightenment on p-values. Next time, we will continue thinking about p-values, specifically the risks involved with testing multiple hypotheses in the same analysis.

Thanks for reading and hope you will join us for the next installment in a few weeks!


Etz, A. (2015) “Understanding Bayes: A Look at the Likelihood.” URL:

Kurt, W. (2016) “A Guide to Bayesian Statistics.” URL:

Trafimow, D. and Marks, M. (2015) “Editorial.” URL:

Wasserstein, R.L., and Lazar, N.A. (2016) “The ASA’s statement on p-values: context, process, and purpose.” URL:


Past Articles in the Series

  1. Your Brain on Statistics


Bonus Article: A different type of p-value…


I am working with E. Ashley Steel at the PNW Research Station to write short articles on how we can improve the way we think about statistics. Consequently, I am posting a series of five blogs that explores statistical thinking, provides methods to train intuition, and instills a healthy dose of skepticism. Subscribe to this blog or follow me @ChenWillMath to know when the next one comes out!

Ideas in this series are based on material from the course, “So You Think You Can Do Statistics?” taught by Dr. Peter Guttorp, Statistics, University of Washington with support from Dr. Ashley Steel, PNW Station Statistician and Quantitative Ecologist, and Dr. Martin Liermann, statistician and quantitative ecologist at NOAA’s Northwest Science Center.



Your Brain on Statistics

Are apparent patterns indicative of population differences or simply caused by different sample sizes?

I am working with E. Ashley Steel at the PNW Research Station to write short articles on how we can improve the way we think about statistics. Consequently, I am posting a series of five blogs that explores statistical thinking, provides methods to train intuition, and instills a healthy dose of skepticism. Subscribe to this blog or follow me @ChenWillMath to know when the next one comes out!

We begin by looking at how the wiring of the brain interferes with our ability to process statistics. The way we internalize information and make decisions can be broken down into two categories:

  • System 1 thinking that is automatic and intuition-based
  • System 2 thinking that is more deliberate and analytic

Unfortunately, the impulsive nature of System 1 thinking tends to get us into trouble when we interpret statistics. For example, look at the following map of the lower 48 United States.


It illustrates the counties that exhibit the highest 10% of kidney cancer rates (i.e. number of per capita kidney cancer cases), colored by whether they are predominantly rural or urban. Note that there are more rural counties represented on the map than urban counties and that many of the cancer-prevalent counties are in the South or Midwest.

Why might that be? Perhaps rural areas tend to have less access to clean water, which could adversely affect kidney function? Perhaps there are more factories in these areas leading to more health issues?

Before you get too far, let me show you another map, this time of the counties in the bottom 10% of kidney rate incidence.


Interestingly, rural areas appear over-represented among the counties with the lowest kidney cancer rates as well! What is going on?

This was the conundrum that Howard Wainer delved into in an article titled “The most dangerous equation”, published in the American Scientist in 2007. Wainer explained how trends can appear even when the underlying probability of an event occurring is constant. Using data from the United States Census Bureau, we have simulated that scenario in the maps above.

The effect you are seeing has nothing to do with rural versus urban, though it would make a believable headline. The real culprit is population size. It turns out that smaller samples, such as less populous counties, are more prone to exhibiting extreme results. Let us explore this further.

Imagine you flipped 3 (fair) coins. The chance of getting either all heads or all tails is 25%. Now what is the chance of getting all heads or all tails when flipping 30 coins? Less than 1 in 10,000. Despite the identical chance for any one coin to turn up heads (or tails), larger collections of coin flips are less likely to all be heads.

The take home point: our brains are predisposed to look for and interpret patterns. However, strong patterns, regardless of tempting explanations, can be caused by random chance. Here, sample-size differences across counties are responsible for observed kidney cancer rate differences, despite the constant individual risk of kidney cancer (which is likely not the case, but that is a different discussion).

So, what should scientists and science readers do? The first step is to remain vigilant. When confronted with apparent patterns, consider whether they might be due to chance alone.  For data like these, ask if the more extreme responses are exhibited by the samples that contain fewer individuals or cover smaller areas. You might also consider using simulations to assess how much random chance contributes to apparent patterns. Simulations will be discussed in future installments of this summer statistical thinking series.

If you would like to know more about how the brain tricks you into false statistical conclusions, Amos Tversky and Daniel Kahneman discusses this and many other pitfalls.

Thanks for reading and stay tuned for the next installment! We’ll be talking about the p-word!



Bhalla, J. “Kahneman’s Mind-Clarifying Strangers: System 1 & System 2”. URL: Accessed 27 May 2017.

Tversky, A. & Kahneman, D. (1974) Judgment under Uncertainty: Heuristics and Biases. Science 185 (4157). URL: Accessed 27 May 2017.

United States Census Bureau. “Geography: Urban and Rural”. URL: Accessed 27 May 2017.

Wainer, H. (2007). The Most Dangerous Equation. American Scientist. 95 (3). URL: Accessed 27 May 2017.


Ideas in this series are based on material from the course, “So You Think You Can Do Statistics?” taught by Dr. Peter Guttorp, Statistics, University of Washington with support from Dr. Ashley Steel, PNW Station Statistician and Quantitative Ecologist, and Dr. Martin Liermann, statistician and quantitative ecologist at NOAA’s Northwest Science Center.

Geek Heresy and EarthGames

I’ve recently started reading a fantastic book on a friend’s recommendation called Geek Heresy: Rescuing Social Change from the Cult of Technology. The book takes a look at the culture of technology in human society, with the premise of delving into how technology came to be so highly-regarded as a tool for social change and why this view can be problematic. I’m only one chapter in, but Geek Heresy has already got me thinking about what is likely a central theme: technology does little for social change without the right people to support the change.

Over the weekend, I helped represent EarthGames UW at the second annual Seattle Youth Climate Action Network (Seattle Youth CAN) Summit. During the lunch hour, we let the eager high-schoolers explore some of the games that EarthGames designed over the past year. We followed this up with an activity-packed hour where we guided a dozen students in developing a concept for their very own environmental game!

The event ended up being the highlight of my weekend. I met a young woman who had already designed her own game about pollution using HTML/Javascript, and within the hour-long game jam, we already had a game concept down (tower defense style game about overfishing)! I got to meet a bunch of really smart kids that were excited to bring about environmental change.

Now, you might be wondering why these two pieces are in the same blog post. Throughout the event, I kept thinking back to Geek Heresy and how these games are like the teaching tools presented at the beginning of the book. While EarthGames UW was founded on the motivation to teach people about climate change and the environment, the games that we make are just as likely to see the same downfalls as the laptops-in-the-wall presented in Geek Heresy’s first chapter — a lack of mentoring or guidance means less effective or a complete lack of social change.

I’m glad that EarthGames is taking on more opportunities to engage with youth with games and game design. There’s a lot of potential in using games to engage with the public, and even more in using game design to let the public engage with us and each other. I hope EarthGames will continue to foster collaborations with engagement groups to enable change in our society. If I get the chance (and time!), I hope to be able to foster these collaborations myself.

What do you think is essential for social change? How do you go about engaging your community? Let me know in the comments section!



ENGAGEing graduate student research talks coming to Town Hall Seattle!

I’ve had the excellent opportunity to be participating in University of Washington’s ENGAGE seminar this year. I encourage you to look around their website, but in short, it is a science communication seminar aimed at giving science graduate students the skills to translate their research into a form that is digestible by a general audience.

To show that we can “walk the talk”, so to speak, we will be giving twenty-minute presentations at Town Hall Seattle. Topics this year range from the ethics of social media data to bio-engineered crops to alien life within Arctic glaciers!

I’ll be presenting my research on dam management, and how math and statistics help both human society and rivers have the water they need even as fresh water becomes more scarce. Be sure to be at Town Hall Seattle on May 12 if you want to hear my talk, but talks will be happening throughout March, April, and May. More details to come soon!

ENGAGE Seminar Blog

I’m currently enrolled in a seminar on science communication called ENGAGE, and it’s been incredibly informative! The goal of the seminar is to teach graduate students how to effectively communicate their scientific research to the general public. It culminates in a presentation at Town Hall Seattle! Dates are to be determined, but be on the lookout for exciting and accessible talks in a few months!

In the meantime, I did a guest blog for the seminar, which you can find here. While I focused on relating the class assignments to my recent board game exhibition, the same lessons applies to scientific presentations as well. There is only so much time in a presentation, so a real challenge is how to pack everything you want to say into an engaging package without skimping on details? Often in scientific presentations, the temptation is to cram every detail in; after all, we don’t want to misconstrue any aspect of our work, right? Unfortunately, when we do that, our audience only gets the sense that there’s a lot of details, and really loses sight of the story.

I recently gave an hour-long presentation to an audience of quantitatively-focused professors and students. Generally, this crowd appreciates seeing the details behind mathematical models, so at first I thought, “Hey, there’s this really cool method that I’m incorporating, and I should talk in-depth about it so others can appreciate how cool it is too!” Upon further reflection though, I realized that my story wasn’t really about this cool method. It was how I used this cool method to show an even cooler framework for solving a central conflict in dam management – namely, how do we allocate fresh water so that human society can benefit from rivers, without drastically harming the river itself? In the end, I cut most my discussion of the “cool method” and focused on the “cool story” where the “cool method” was a supporting actor.

The result? I got to talk about the “cool method” without it interfering with the overall “cool story”. For the people that were interested in the math, I offered up a novel tool. For the people that were interested in dams and applications, I offered up a success story for an incredibly challenging problem. By dialing back on the accuracy just a bit, I was able to engage my audience a little more, and everyone ended up winning.

Twitter Notes: Salmon, Bayesian Models, and Portfolios

The following notes are from the Fall QERM Seminar, where faculty give presentations on their research. This week features Daniel Schindler from the School of Aquatic and Fisheries Sciences (SAFS).

  • Ecology happens at massively varied spatial and temporal scales
  • Quantitative methods allow us to integrate spatial and temporal scales
  • Salmon are not only freshwater resource subsidies, but are also ecosystem engineers
  • Time series of water oxygen content contains a lot of info on gas exchange rate between the water and the atmosphere
  • Bayesian models can estimate gas exchange rates with orders of magnitude greater accuracy than empirical surveys
  • Variation in population dynamics across streams keeps fisheries sustainable (because of fishing portfolios)
  • Variability comes from local adaptation in fish and from shifting mosaics of suitable habitat
  • Models can (and should!) be used to make science more transparent

I was impressed by the speaker’s example of how models could be used to communicate a scientific result to a general audience. So here’s my question to you:

What should you do (or not do) to make your research more accessible to a general audience?

Twitter notes

I was exposed to an interesting idea today that I want to try out: condensing a talk into tweets. Now, I still haven’t got into using Twitter, but I think this is a neat idea. Every few slides (or minutes), summarize the talk in a tweet (140 characters). If I was using Twitter, I’d do this live, but I don’t so I won’t (and it’d probably be disrespectful in a class of 3-5 anyways).

I tend to be awful at concise summaries and explanations (if you haven’t noticed already), so this should be a fun and useful exercise.