5 Considerations For Statistical Analysis Of Flow Cytometry Data

Congratulations, your grant has been funded! Now comes the hard part — performing the work that you are being funded to do. This means generating data and publishing papers. What was that hypothesis again? It must be in the grant somewhere, right?

For the sake of this blog, the grant is to study the effects of Cordilla Virus, which is known to cause lipid membrane flipping on CD8+ T-cells. This flipping results in phosphatidylserine expression on the outer membrane, causing infected cells to be phagocytosed by macrophages. A lead compound, Masiform D, has been identified that shows promise in reducing the viral load of patients infected with Cordilla Virus.

To avoid even the appearance of HARKing — Hypothesizing After The Results Are Known — it is important to start at the beginning of the statistical analysis process even before the first experiments are performed. This process consists of 5 steps:

1. Set the Null Hypothesis

The null hypothesis (H0) is a statement of what we think the state of the system is. In this case, the state of the system after treatment would be that there was no change in the system — i.e. the means between control and experimental are equal.

When performing statistical inference testing, this is the baseline state of our system. We are going to demand a great deal of proof (“beyond a reasonable doubt”) to reject the null hypothesis and accept the alternative hypothesis (HA).

If our data rises to the level of confidence to reject the H0, we are very sure in our HA. If our data does not allow us to reject the H0, it doesn’t mean that the H0 is true, it just means the evidence doesn’t support rejecting it.

When stating the null, it is important to remember that the “equals” sign will be associated with the H0, and not the HA. Additionally, if we are predicting the effect will be in one direction or the other, that is acceptable in the H0.

Going back to our hypothetical grant, we are looking to see if Masiform D reduces the viral load on patients infected with Cordilla Virus. Since we only care if Masiform D decreases viral load in our patients (as opposed to increased or stable load), the H0 and HA could be stated like this:

H0: Viral load (VL) in patients infected with Cordilla Virus and treated with Masiform D (MD) remains the same or is increased compared to patients infected with Cordilla Virus who were not treated with Masiform D (UT) : VLUT ≥ VLMD

HA: Viral load (VL) in patients infected with Cordilla Virus and treated with Masiform D (MD) is decreased compared to patients infected with Cordilla Virus who were not treated with Masiform D (UT) : VLUT < VLMD

Notice 2 important factors about the H0 and HA

  1. They are mutually exclusive — this means that the H0 and HA do not share anything in common.
  2. They are collectively exhaustive — this means that all possible outcomes are explained by either the the H0 or the HA.

Having set the stage, it is time to build the other factors that will go into determining whether we accept or reject the H0.

2. Establish a threshold

The threshold (or α) can be thought of as the finish-line. If the analysis of the data crosses this threshold, the H0 is rejected and the HA is accepted.

This threshold is typically set at 0.05, based on convention. However, lowering the threshold to 0.01 or even 0.001 may be more appropriate.

There is no easy guidance for this choice, except that the lower the threshold (higher p-value), the easier it is to find significance, and to commit a type I (false positive) error. The table below is based on this article which categorizes p-values based on the consequences of a false positive.

p Value assumes a 2-tailed testConsequences of False Positive
0.01Death — think clinical trials
0.05No publication — standard for most publications
0.1You lose time and money — increased false positives means chasing down more leads, but few other consequences
0.2You lose more time and money — increased false positives means chasing down more leads, but few other consequences
0.49You are wrong — just a bit better than flipping a coin.

As the desired p-value decreases, it is valuable to increase the “n” — that is, the number of samples — which can be determined using the Power calculation, which helps provide an estimate of the number of samples needed to be collected.

It is also a measure of the chance of a false negative (𝛃) since Power = 1-𝛃. You can find a power calculator here for free. It’s always a good idea to check with your local biostatistician before you finalize your plan!

3. Performing the experiments

This is the fun part of the process. Make sure to go through the process to design and validate an experiment that is discussed in other parts of the blog, such as: panel design, instrument optimization, and preparing for your first experiment.

In the case of our hypothetical experiment, flow cytometry was chosen to measure the CD8+ cells in our patients, and to determine the amount of phosphatidylserine on the surface of these cells using the Annexin V reagent.

After the experiment is done, the primary analysis will be performed. A guide about gating and primary analysis can be found here. The important point is that the correct numerical data is extracted from the primary analysis, and the gating is done using all the controls necessary to identify the populations of interest.

4. Performing the statistical test

This is where everything comes together. The experiments are complete, and the numerical data has been extracted.

In the case of our hypothetical experiment, CD8+ cells were identified through immunostaining and the amount of Annexin V staining was described by fluorescence intensity.

Since this is a measure of changes in fluorescent intensity, the data is used to calculate the resolution metric, the RD. The RD is: medianpos-medianneg/rSDpos+rSDneg

Thus, the dataset is the list of RD for patients who have been treated versus those who have not been treated. From there, the statistical analysis will be performed.

In this experiment, the statistical test that will be used is the Student’s T-Test. This test relies on the T-distribution, which looks like a normal distribution, but has heavier tails, because of something called the “degrees of freedom”.

Mathematically, the degrees of freedom are measured as the sample size minus 1 (n-1). What does the degrees of freedom really mean?

This is the constraint of the system. For example, if you know 100 values and have the mean, the first 99 can take any value. The 100th value is completely constrained because that value must make the average that has been determined. The same applies for the median. Thus, only n-1 values have freedom.

Back to the T-distribution, the figure below shows the difference between a normal distribution (green), and a distribution with 3 degrees of freedom (blue), and 10 degrees of freedom (red), and the 95% confidence intervals are the vertical lines of the same colors.

As can be seen, the fewer the number of samples, the larger the differences have to be to determine statistical significance. The “rejection region” is the area under the curve to the left and right of the 95% lines. If this was a one-tailed test, then only one of these areas is considered.

Mathematically, there are 2 ways to determine if the data is statistically significant. In either case, we calculate the T statistic, which is defined as:

Based on the threshold (1-CI, or α), it is possible to calculate the Critical Value (CV). The decision rule becomes if T*>CV, we reject the H0, and thus accept the HA. The alternative way to use this is to use T* to calculate the p value. Under this method, if the α>p, we would reject the H0, and thus accept the HA.

It doesn’t matter which method you use, although the second method is a bit easier to understand and easier to compare to the threshold. Both approaches get you to the same answer, but most statistical programs use the p value method.

FIGURE 1: Comparison of the T distributions with the Standard Normal Distribution.

5. Stating the results

The last step is to state the conclusion. Based on the results of the statistical analysis, there are 2 options: either the H0 is rejected, or not.

If we reject the H0, that means we accept that we are confident in the HA.

One way to consider the interpretation of the p-value is to say that if a random sample were chosen, what is the probability that it would be at least as extreme as the observed data, under the assumption that the null hypothesis is true?

Thus, it can be interpreted as if the null hypothesis was really true, it would be extremely unlikely to see the kind of data that is being observed.

Does that mean that if we don’t reject the H0, we are equally confident that it is true? The answer is no — all it means is that there is not sufficient evidence to reject the H0, which is very different than saying that you should accept the H0.

Is that all? Does it stop at this point? Historically, this was the end of the analysis and papers with the phrase, “statistically significant” used to describe the data was published and research marched on.

However, over the last few years, the statistical community has been having a debate about the power of the p-value and how to properly interpret this. This article by Regina Nuzzo describes in detail some of the issues around how the p-value is interpreted, suggesting that, “…it is not as reliable as scientists assume.” This has lead the American Statistical Association to develop a statement about the p-value.

There is a role for the p-value in hypothesis testing, however, it is critical that we continue to evaluate the best tools to define statistical significance. These ideas will be the topics of future blog posts, so stay tuned!

To learn more about 5 Considerations For Statistical Analysis Of Flow Cytometry Data, and to get access to all of our advanced materials including 20 training videos, presentations, workbooks, and private group membership, get on the Flow Cytometry Mastery Class wait list.

Join Expert Cytometry's Mastery Class
Tim Bushnell, PhD
Tim Bushnell, PhD

Tim Bushnell holds a PhD in Biology from the Rensselaer Polytechnic Institute. He is a co-founder of—and didactic mind behind—ExCyte, the world’s leading flow cytometry training company, which organization boasts a veritable library of in-the-lab resources on sequencing, microscopy, and related topics in the life sciences.

Similar Articles

Tools to Improve Your Panel Design – Determining Antigen Density

Tools to Improve Your Panel Design – Determining Antigen Density

By: Tim Bushnell, PhD

When a researcher chooses to use flow cytometry to answer a scientific question, they first have to build a polychromatic panel that will take advantage of the power of the technology and experimental design. When we set up to use flow cytometry to answer a scientific question, we have to design a polychromatic panel that will allow us to identify the cells of interest – the target of the research.  To identify these cells, we need to build a panel that takes advantage of the relative brightness of the fluorochromes, the expression level of the different proteins on the cell,…

This Is How Full Spectrum Cytometry Works

This Is How Full Spectrum Cytometry Works

By: Tim Bushnell, PhD

There are 4 major ways to sort cells. The first way can use magnetic beads coupled to antibodies and pass the cells through a magnetic field. The labeled cells will stick, and the unlabeled cells will remain in the supernatant. The second way is to use some sort of mechanical force like a flapper or air stream that separates the target cells from the bulk population. The third way is the recently introduced microfluidics sorter, which uses microfluidics channels to isolate the target cells. The last method, which is the most common––based on Fuwyler’s work––is the electrostatic cell sorter. This…

My Proven 5-Point Fast Track To A Career In Flow

My Proven 5-Point Fast Track To A Career In Flow

By: Tim Bushnell, PhD

There are 4 major ways to sort cells. The first way can use magnetic beads coupled to antibodies and pass the cells through a magnetic field. The labeled cells will stick, and the unlabeled cells will remain in the supernatant. The second way is to use some sort of mechanical force like a flapper or air stream that separates the target cells from the bulk population. The third way is the recently introduced microfluidics sorter, which uses microfluidics channels to isolate the target cells. The last method, which is the most common––based on Fuwyler’s work––is the electrostatic cell sorter. This…

Up Your Stain Game With These 7 Non-Fluorescent Histology Dyes

Up Your Stain Game With These 7 Non-Fluorescent Histology Dyes

By: Heather Brown-Harding, PhD

There are 4 major ways to sort cells. The first way can use magnetic beads coupled to antibodies and pass the cells through a magnetic field. The labeled cells will stick, and the unlabeled cells will remain in the supernatant. The second way is to use some sort of mechanical force like a flapper or air stream that separates the target cells from the bulk population. The third way is the recently introduced microfluidics sorter, which uses microfluidics channels to isolate the target cells. The last method, which is the most common––based on Fuwyler’s work––is the electrostatic cell sorter. This…

3 Ways Flow Cytometry Can Be Used To Research Bacteria

3 Ways Flow Cytometry Can Be Used To Research Bacteria

By: Tim Bushnell, PhD

There are 4 major ways to sort cells. The first way can use magnetic beads coupled to antibodies and pass the cells through a magnetic field. The labeled cells will stick, and the unlabeled cells will remain in the supernatant. The second way is to use some sort of mechanical force like a flapper or air stream that separates the target cells from the bulk population. The third way is the recently introduced microfluidics sorter, which uses microfluidics channels to isolate the target cells. The last method, which is the most common––based on Fuwyler’s work––is the electrostatic cell sorter. This…

Avoid Flow Cytometry Faux Pas: How To Set Voltage The Right Way

Avoid Flow Cytometry Faux Pas: How To Set Voltage The Right Way

By: Tim Bushnell, PhD

There are 4 major ways to sort cells. The first way can use magnetic beads coupled to antibodies and pass the cells through a magnetic field. The labeled cells will stick, and the unlabeled cells will remain in the supernatant. The second way is to use some sort of mechanical force like a flapper or air stream that separates the target cells from the bulk population. The third way is the recently introduced microfluidics sorter, which uses microfluidics channels to isolate the target cells. The last method, which is the most common––based on Fuwyler’s work––is the electrostatic cell sorter. This…

Discover The Myriad Applications Of Beads In Flow Cytometry

Discover The Myriad Applications Of Beads In Flow Cytometry

By: Tim Bushnell, PhD

What is the single-most important feature of a flow cytometry experiment? Arguably, it’s the stained cells that gather data about biological processes of interest. However, a flow cytometer can measure cell-like particles as well as cells, which opens the realm of cytometry to the use of microspheres. Most researchers are familiar with the 4-Cs that beads can be used for: Control, Calibration, Compensation, and Counting. Beyond the 4-Cs, many are familiar with the multiplex bead assays for measuring analytes. Today, we will take a look beyond these well-known uses and discover the myriad applications of the “Mighty Microspheres.”

Mass Cytometry Revolves Around These 5 Things

Mass Cytometry Revolves Around These 5 Things

By: Tim Bushnell, PhD

Because mass cytometry allows users to characterize masses so effectively, data can be normalized much more efficiently than what traditional fluorescent flow will permit. If there is no working CyTof at your institution, you can still partner with CyTof-friendly research institutions that have the technology on hand. And because the samples are fixed, you can ship them overnight. This way, they will be analyzed for you. Today’s article will summarize the functionality of mass cytometry technology. This tech has been commercialized largely by Fluidigm in the CyTof systems. There are 5 key points to cover, or takeaways, that cytometrists should…

3 Ways To Improve Flow Cytometry Troubleshooting

3 Ways To Improve Flow Cytometry Troubleshooting

By: Tim Bushnell, PhD

A lot of the troubleshooting is focused on fluidics issues. If you sit down and think about your workflow, and how you might want to add a couple of little tweaks here and there which will ultimately help you improve the quality of your data as well as aid you in identifying issues before they become problems your troubleshooting will be much smoother. Consider these three things, what do you before you start collecting data, ensure you have appropriate plots of time vs fluorescence for each of the lasers your using and apply appropriate gating procedures.

Top Technical Training eBooks

Get the Advanced Microscopy eBook

Get the Advanced Microscopy eBook

Heather Brown-Harding, PhD

Learn the best practices and advanced techniques across the diverse fields of microscopy, including instrumentation, experimental setup, image analysis, figure preparation, and more.

Get The Free Modern Flow Cytometry eBook

Get The Free Modern Flow Cytometry eBook

Tim Bushnell, PhD

Learn the best practices of flow cytometry experimentation, data analysis, figure preparation, antibody panel design, instrumentation and more.

Get The Free 4-10 Compensation eBook

Get The Free 4-10 Compensation eBook

Tim Bushnell, PhD

Advanced 4-10 Color Compensation, Learn strategies for designing advanced antibody compensation panels and how to use your compensation matrix to analyze your experimental data.