5 Considerations For Statistical Analysis Of Flow Cytometry Data

Congratulations, your grant has been funded! Now comes the hard part — performing the work that you are being funded to do. This means generating data and publishing papers. What was that hypothesis again? It must be in the grant somewhere, right?

For the sake of this blog, the grant is to study the effects of Cordilla Virus, which is known to cause lipid membrane flipping on CD8+ T-cells. This flipping results in phosphatidylserine expression on the outer membrane, causing infected cells to be phagocytosed by macrophages. A lead compound, Masiform D, has been identified that shows promise in reducing the viral load of patients infected with Cordilla Virus.

To avoid even the appearance of HARKing — Hypothesizing After The Results Are Known — it is important to start at the beginning of the statistical analysis process even before the first experiments are performed. This process consists of 5 steps:

1. Set the Null Hypothesis

The null hypothesis (H0) is a statement of what we think the state of the system is. In this case, the state of the system after treatment would be that there was no change in the system — i.e. the means between control and experimental are equal.

When performing statistical inference testing, this is the baseline state of our system. We are going to demand a great deal of proof (“beyond a reasonable doubt”) to reject the null hypothesis and accept the alternative hypothesis (HA).

If our data rises to the level of confidence to reject the H0, we are very sure in our HA. If our data does not allow us to reject the H0, it doesn’t mean that the H0 is true, it just means the evidence doesn’t support rejecting it.

When stating the null, it is important to remember that the “equals” sign will be associated with the H0, and not the HA. Additionally, if we are predicting the effect will be in one direction or the other, that is acceptable in the H0.

Going back to our hypothetical grant, we are looking to see if Masiform D reduces the viral load on patients infected with Cordilla Virus. Since we only care if Masiform D decreases viral load in our patients (as opposed to increased or stable load), the H0 and HA could be stated like this:

H0: Viral load (VL) in patients infected with Cordilla Virus and treated with Masiform D (MD) remains the same or is increased compared to patients infected with Cordilla Virus who were not treated with Masiform D (UT) : VLUT ≥ VLMD

HA: Viral load (VL) in patients infected with Cordilla Virus and treated with Masiform D (MD) is decreased compared to patients infected with Cordilla Virus who were not treated with Masiform D (UT) : VLUT < VLMD

Notice 2 important factors about the H0 and HA

  1. They are mutually exclusive — this means that the H0 and HA do not share anything in common.
  2. They are collectively exhaustive — this means that all possible outcomes are explained by either the the H0 or the HA.

Having set the stage, it is time to build the other factors that will go into determining whether we accept or reject the H0.

2. Establish a threshold

The threshold (or α) can be thought of as the finish-line. If the analysis of the data crosses this threshold, the H0 is rejected and the HA is accepted.

This threshold is typically set at 0.05, based on convention. However, lowering the threshold to 0.01 or even 0.001 may be more appropriate.

There is no easy guidance for this choice, except that the lower the threshold (higher p-value), the easier it is to find significance, and to commit a type I (false positive) error. The table below is based on this article which categorizes p-values based on the consequences of a false positive.

p Value assumes a 2-tailed testConsequences of False Positive
0.01Death — think clinical trials
0.05No publication — standard for most publications
0.1You lose time and money — increased false positives means chasing down more leads, but few other consequences
0.2You lose more time and money — increased false positives means chasing down more leads, but few other consequences
0.49You are wrong — just a bit better than flipping a coin.

As the desired p-value decreases, it is valuable to increase the “n” — that is, the number of samples — which can be determined using the Power calculation, which helps provide an estimate of the number of samples needed to be collected.

It is also a measure of the chance of a false negative (𝛃) since Power = 1-𝛃. You can find a power calculator here for free. It’s always a good idea to check with your local biostatistician before you finalize your plan!

3. Performing the experiments

This is the fun part of the process. Make sure to go through the process to design and validate an experiment that is discussed in other parts of the blog, such as: panel design, instrument optimization, and preparing for your first experiment.

In the case of our hypothetical experiment, flow cytometry was chosen to measure the CD8+ cells in our patients, and to determine the amount of phosphatidylserine on the surface of these cells using the Annexin V reagent.

After the experiment is done, the primary analysis will be performed. A guide about gating and primary analysis can be found here. The important point is that the correct numerical data is extracted from the primary analysis, and the gating is done using all the controls necessary to identify the populations of interest.

4. Performing the statistical test

This is where everything comes together. The experiments are complete, and the numerical data has been extracted.

In the case of our hypothetical experiment, CD8+ cells were identified through immunostaining and the amount of Annexin V staining was described by fluorescence intensity.

Since this is a measure of changes in fluorescent intensity, the data is used to calculate the resolution metric, the RD. The RD is: medianpos-medianneg/rSDpos+rSDneg

Thus, the dataset is the list of RD for patients who have been treated versus those who have not been treated. From there, the statistical analysis will be performed.

In this experiment, the statistical test that will be used is the Student’s T-Test. This test relies on the T-distribution, which looks like a normal distribution, but has heavier tails, because of something called the “degrees of freedom”.

Mathematically, the degrees of freedom are measured as the sample size minus 1 (n-1). What does the degrees of freedom really mean?

This is the constraint of the system. For example, if you know 100 values and have the mean, the first 99 can take any value. The 100th value is completely constrained because that value must make the average that has been determined. The same applies for the median. Thus, only n-1 values have freedom.

Back to the T-distribution, the figure below shows the difference between a normal distribution (green), and a distribution with 3 degrees of freedom (blue), and 10 degrees of freedom (red), and the 95% confidence intervals are the vertical lines of the same colors.

As can be seen, the fewer the number of samples, the larger the differences have to be to determine statistical significance. The “rejection region” is the area under the curve to the left and right of the 95% lines. If this was a one-tailed test, then only one of these areas is considered.

Mathematically, there are 2 ways to determine if the data is statistically significant. In either case, we calculate the T statistic, which is defined as:

Based on the threshold (1-CI, or α), it is possible to calculate the Critical Value (CV). The decision rule becomes if T*>CV, we reject the H0, and thus accept the HA. The alternative way to use this is to use T* to calculate the p value. Under this method, if the α>p, we would reject the H0, and thus accept the HA.

It doesn’t matter which method you use, although the second method is a bit easier to understand and easier to compare to the threshold. Both approaches get you to the same answer, but most statistical programs use the p value method.

FIGURE 1: Comparison of the T distributions with the Standard Normal Distribution.

5. Stating the results

The last step is to state the conclusion. Based on the results of the statistical analysis, there are 2 options: either the H0 is rejected, or not.

If we reject the H0, that means we accept that we are confident in the HA.

One way to consider the interpretation of the p-value is to say that if a random sample were chosen, what is the probability that it would be at least as extreme as the observed data, under the assumption that the null hypothesis is true?

Thus, it can be interpreted as if the null hypothesis was really true, it would be extremely unlikely to see the kind of data that is being observed.

Does that mean that if we don’t reject the H0, we are equally confident that it is true? The answer is no — all it means is that there is not sufficient evidence to reject the H0, which is very different than saying that you should accept the H0.

Is that all? Does it stop at this point? Historically, this was the end of the analysis and papers with the phrase, “statistically significant” used to describe the data was published and research marched on.

However, over the last few years, the statistical community has been having a debate about the power of the p-value and how to properly interpret this. This article by Regina Nuzzo describes in detail some of the issues around how the p-value is interpreted, suggesting that, “…it is not as reliable as scientists assume.” This has lead the American Statistical Association to develop a statement about the p-value.

There is a role for the p-value in hypothesis testing, however, it is critical that we continue to evaluate the best tools to define statistical significance. These ideas will be the topics of future blog posts, so stay tuned!

To learn more about 5 Considerations For Statistical Analysis Of Flow Cytometry Data, and to get access to all of our advanced materials including 20 training videos, presentations, workbooks, and private group membership, get on the Flow Cytometry Mastery Class wait list.

Join Expert Cytometry's Mastery Class

ABOUT TIM BUSHNELL, PHD

Tim Bushnell holds a PhD in Biology from the Rensselaer Polytechnic Institute. He is a co-founder of—and didactic mind behind—ExCyte, the world’s leading flow cytometry training company, which organization boasts a veritable library of in-the-lab resources on sequencing, microscopy, and related topics in the life sciences.

Tim Bushnell, PhD

Similar Articles

The Power Of Spectral Viewers And Their Use In Full Spectrum Flow Cytometry

The Power Of Spectral Viewers And Their Use In Full Spectrum Flow Cytometry

By: Tim Bushnell, PhD

What photon from yonder fluorochrome breaks?  It is … umm… hmmm. Let me see. Excitation off a 561 nm laser, with an emission maximum of 692 nm. I’m sure if Shakespeare was a flow cytometrist, he might have written that very scene. But the play is lost in time. However, since the protagonist had difficulty determining what fluorochrome was emitting photons, let’s consider how this could be figured out. In my opinion, one of the handiest flow cytometry tools is the spectral viewer. This tool helps visualize the excitation and emission profile of different fluorochromes, as well as allowing you…

Fickle Markers: Solutions For Antibody Binding Specificity Challenges

Fickle Markers: Solutions For Antibody Binding Specificity Challenges

By: Tim Bushnell, PhD

Reproducibility has been an ongoing, and important, concept in the sciences for years.  In the area of biomedical research, the alarm was sounded by several papers published in the early 2010’s.  Authors like Begley and Ellis, Prinz and coworkers, and Vasilevsky and colleagues, among others reported an alarming trend in the reproducibility of pre-clinical data.  These reports indicated between 50% to almost 90% of published pre-clinical data were not reproducible.  This was further highlighted in the article by Freedman and coworkers, who tried to identify and quantify the different sources of error that could be causing this crisis.  Figure 1,…

5 Common Flow Cytometry Questions, Answered

5 Common Flow Cytometry Questions, Answered

By: Tim Bushnell, PhD

I want to thank all of you who send us your questions about flow cytometry, so I thought I would dip into the old email bag and answer a few of the common ones here.  If your question isn’t answered this time, look for it to be answered in a future blog post.  Of course, if you want us to cover a specific topic, drop us a line.  1. How Fast Can I Go? This is  a common question. The allure of the ‘hi’ button is hard to resist.  The faster you go, the sooner you are finished with data…

Combining Flow Cytometry With Plant Science, Microorganisms, And The Environment

Combining Flow Cytometry With Plant Science, Microorganisms, And The Environment

By: Tim Bushnell, PhD

My first introduction to flow cytometry was talking to a professor who’d brought one on a research cruise to study phytoplankton. It was only later that I was introduced to the marvelous world that’s been my career for over 20 years.   In that time, I’ve had the opportunity to work with researchers in many different areas, exposing me to a wide variety of cell types and more important assays. What continues to amaze me is the number of different parameters we can measure, not just the number of fluorochromes, but the information we can extract from samples – animal, vegetable…

Common Numbers-Based Questions I Get As A Flow Cytometry Core Manager And How To Answer Them

Common Numbers-Based Questions I Get As A Flow Cytometry Core Manager And How To Answer Them

By: Tim Bushnell, PhD

Numbers are all around us.  My personal favorite is ≅1.618 aka ɸ aka ‘the golden ratio’.  It’s found throughout history, where it has influenced architects and artists. We see it in nature, in plants, and it is used in movies to frame shots. It can be approximated by the Fibonacci sequence (another math favorite of mine). However, I have not worked out how to apply this to flow cytometry.  That doesn’t mean numbers aren’t important in flow cytometry. They are central to everything we do, and in this blog, I’m going to flit around numbers-based questions that I have received…

3 Must-Have High-Dimensional Flow Cytometry Controls

3 Must-Have High-Dimensional Flow Cytometry Controls

By: Tim Bushnell, PhD

Developments such as the recent upgrade to the Cytobank analysis platform and the creation of new packages such as Immunocluster are reducing the computational expertise needed to work with high-dimensional flow cytometry datasets. Whether you are a researcher in academia, industry, or government, you may want to take advantage of the reduced barrier to entry to apply high-dimensional flow cytometry in your work. However, you’ll need the right experimental design to access the new transformative insights available through these approaches and avoid wasting the considerable time and money required for performing them. As with all experiments, a good design begins…

The Fluorochrome Less Excited: How To Build A Flow Cytometry Antibody Panel

The Fluorochrome Less Excited: How To Build A Flow Cytometry Antibody Panel

By: Tim Bushnell, PhD

Fluorochrome, antibodies and detectors are important. The journey of a thousand cells starts with a good fluorescent panel. The polychromatic panel is the combination of antibodies and fluorochromes. These will be used during the experiment to answer the biological question of interest. When you only need a few targets, the creation of the panel is relatively straightforward. It’s only when you start to get into more complex panels with multiple fluorochromes that overlap in excitation and emission gets more interesting.  FLUOROCHROMES Both full spectrum and traditional fluorescent flow cytometry rely on measuring the emission of the fluorochromes that are attached…

Flow Cytometry Year in Review: Key Changes To Know

Flow Cytometry Year in Review: Key Changes To Know

By: Meerambika Mishra

Here we are, at the end of an eventful year 2021. But with the promise of a new year 2022 to come. It has been a long year, filled with ups and downs. It is always good to reflect on the past year as we move to the future.  In Memoriam Sir Isaac Newton wrote “If I have seen further, it is by standing upon the shoulders of giants.” In the past year, we have lost some giants of our field including Zbigniew Darzynkiwicz, who contributed much in the areas of cell cycle analysis and apoptosis. Howard Shapiro, known for…

What Star Trek Taught Me About Flow Cytometry

What Star Trek Taught Me About Flow Cytometry

By: Tim Bushnell, PhD

It is no secret that I am a very big fan of the Star Trek franchise. There are many good episodes and lessons explored in the 813+ episodes, 12 movies (and counting). Don’t worry, this blog is not going to review all 813, or even 5 of them. Instead, some of the lessons I have taken away from the show that have applicability to science and flow cytometry.  “Darmok and Jalad at Tanagra.”  (ST:TNG season 5, episode 2) This is probably one of my favorite episodes, which involves Picard and an alien trying to establish a common ground and learn…

Top Industry Career eBooks

Get the Advanced Microscopy eBook

Get the Advanced Microscopy eBook

Heather Brown-Harding, PhD

Learn the best practices and advanced techniques across the diverse fields of microscopy, including instrumentation, experimental setup, image analysis, figure preparation, and more.

Get The Free Modern Flow Cytometry eBook

Get The Free Modern Flow Cytometry eBook

Tim Bushnell, PhD

Learn the best practices of flow cytometry experimentation, data analysis, figure preparation, antibody panel design, instrumentation and more.

Get The Free 4-10 Compensation eBook

Get The Free 4-10 Compensation eBook

Tim Bushnell, PhD

Advanced 4-10 Color Compensation, Learn strategies for designing advanced antibody compensation panels and how to use your compensation matrix to analyze your experimental data.