5 Steps For Accurate Flow Cytometry Statistical Analysis Results

At the end of many experiments, the question of statistical analysis rears its ugly head. When it comes up, many researchers freeze not knowing how to proceed, and they muddle through as best they can. With some proper planning and forethought, this doesn’t have to be the case.

To resolve this analysis dilemma, it is important to begin thinking about the statistical analysis during the initial designing of the experiments.

This is where some critical decisions need to be made to ensure that, if there are statistically significant findings to be uncovered, the data will be sufficient to support them.

During initial experiment design, consider the following…

1. Power the flow cytometry experiment properly.

Simply put, the statistical power of an experiment is the likelihood that the experiment will detect an effect if there is one to be measured.

The higher the experiment is powered, the lower the chance of making a Type II (false negative) statistical error.

There are a variety of calculators out there and one of the most useful is Statmate, from GraphPad software. Although those using OSX are out of luck with the package, it is a great program to add to your toolkit.

To use this tool, one needs to determine the statistical test to be performed, the threshold (α), and the standard deviation. With this information, a table such as the one shown below is generated.

Output from statmate

Figure 1: Output from Statmate.

Across the top are columns for different power, and down the side are the number of samples per group.

The numbers in blue represent the difference between the means between the control and experimental sample.

To use the chart, calculate the difference between the experimental and control means, and consult the blue numbers under the column of the power for the experiment.

Cross-referencing the blue number under the power to the number of samples per group will give you the number of samples needed to be run for the appropriate power.

Setting the power at the beginning of the experiment is the best practice and will provide the researcher with confidence on the possibility of finding a statistically significant result if there is one to be discovered.

2. Establish the threshold (significance level) to your statistical test.

The threshold, denoted as the α, is the probability level below which the null hypothesis will be rejected.

This value is historically set at 0.05%.

This value is also related to the possibility of making a Type I (false positive) statistical error and is based on work by Roland Fisher from the 1920s, which he suggested was a “convenient cut-off to reject the null hypothesis.”

Consider the normal distribution, as shown below. If a two-tailed T-test is performed on the data, with a threshold of 0.05%, this is distributed evenly above and below the mean. The white areas represent that 5%.

Flow Cytometry chart showing normal distribution

Figure 2: From https://en.wikipedia.org/wiki/File:NormalDist1.96.png used under the GNU Free Documentation License.

The P value will be compared to the α to determine if the the null hypothesis can be rejected or not.

Since the α is a measure of committing a Type I error, the consequences of a false positive must be considered when establishing the threshold.

A higher threshold makes it easier to find significance, but increases the possibility of the Type I error.

Lower the threshold and it decreases the possibility of a Type I error. Thus, setting the threshold should be considered based on the specific conditions of the test.

Setting the threshold at the beginning of the experiment is a best practice, as it helps establish the probability of committing a Type I error.

3. Clearly state the hypothesis.

At the beginning of the experimental planning, it is critical to understand what the hypothesis being tested is.

If the hypothesis is poorly stated, the rest of the statistical analysis will be inaccurate, or as it is said ‘Garbage In, Garbage Out.’

Since the hypothesis will be used to establish the null hypothesis (H0), this becomes the most important step in the process, as it forms the basis of why the experiments are being performed.

For example, when asked to determine if a new drug, Pescaline D, increases the number of CD4+ T-cells in patients with Bowden’s malady, one can design an appropriate experiment.

The null hypothesis for this experiment could be stated as: Pescaline D causes no change to the percent of CD4+ cells in patients suffering from Bowden’s Malady.

Setting the null hypothesis at the beginning of the experiment will assist in the design of the experiment, help evaluate the best controls to use, and guide the direction of the statistical test.

4. Choose the correct statistical test.

The statistical test should be identified at the beginning of the experiment.

Based on the null hypothesis, the correct testing method should be clear.

Some common statistical tests, and when they should be used, are listed here:

Common Flow Cytometry statistical tests

Figure 3: Suggested statistical testing. A more complete list can be found here: http://www.graphpad.com/support/faqid/1790/

This choice will influence the data that is extracted from the primary analysis.

In the case of the example above, the data would be the percentage of CD4+ cells between the control (untreated) and experimental (treated).

This would be tested using an unpaired T-Test. However, if the experimental design was to take a sample before treatment (control) and treat the patient (experimental), one would perform a paired T-Test.

For those performing T-tests, another consideration is whether to do the test as a ‘one-tailed’ or ‘two-tailed’ T-test.

This is another consideration that has to be made before the experiments are performed.

If the expected change is in one direction — that is, there will be an increase or a decrease — then a one-tailed T-test is appropriate.

On the other hand, if it is not known which direction the change will occur, a two-tailed T-test is the best test to choose. This is defined at the beginning of the experiment to avoid the desire to look at the data and choose a one- or two- tailed T-test at the end.

Choosing the appropriate statistical test at the beginning of the experimental design process is the best practice to prevent bias.

This will ensure that there is no experimenter bias introduced after the data is collected and will also ensure the correct data is extracted from the primary analysis.

5. Know how to plot your data and do it first.

Although it may sound strange, it is very valuable to plot your data before you move forward with your statistical analysis.

Back in 1973, the statistician Francis Anscombe published the now famous Anscombe’s quartet:

The Anscombe Quartet

Figure 4: Anscombe’s quartet. From: https://commons.wikimedia.org/wiki/File:Anscombe%27s_quartet_3.svg used under GNU General Public License.

These four datasets are statistically identical: including the mean, the sample variance, the regression line, and the correlation coefficient.

Anscombe published this dataset when many researchers were starting to use computers for their statistical analysis, and entering data without graphing it. This dataset was designed to point out the fact that graphing data is a critical first step and an important check on the researcher as well. If something looks odd, it may be odd.

When plotting data, it is good practice to use a plot that shows all the data points.

Take a look at these two graphs.

Percentage of CD4+ cells before (U) and after (T) treatment

Figure 5: Percentage of CD4+ cells before (U) and after (T) treatment.

The bar graph on the left shows the same data as the scatter graph on the right.

The difference is that with the scatter graph, it is possible to see that there are two different levels of response in the data after treatment, which is lost in the bar graph. Thus, some data is hidden or not fully evaluated with the bar graph that is visualized with the scatter graph.

Knowing the best graph to use is an important way to convey the important information and supports the statistical analysis that is performed on the data.

It is critical to prepare for your statistical analysis at the beginning of the experimental design process. This will ensure the correct data is extracted, the proper test applied, and that sufficient replicates are obtained so that if an effect is to be found, it will be found. Don’t rely on some magic number of events or samples to determine your experimental design. Rather, rely on the best statistical methods and comparisons to appropriate controls to ensure your data stands up to review.

To learn more about 5 Steps For Accurate Flow Cytometry Statistical Analysis Results, and to get access to all of our advanced materials including 20 training videos, presentations, workbooks, and private group membership, get on the Flow Cytometry Mastery Class wait list.

Join Expert Cytometry's Mastery Class
Tim Bushnell, PhD
Tim Bushnell, PhD

Tim Bushnell holds a PhD in Biology from the Rensselaer Polytechnic Institute. He is a co-founder of—and didactic mind behind—ExCyte, the world’s leading flow cytometry training company, which organization boasts a veritable library of in-the-lab resources on sequencing, microscopy, and related topics in the life sciences.

Similar Articles

We Tested 5 Major Flow Cytometry SPADE Programs for Speed - Here Are The Results

We Tested 5 Major Flow Cytometry SPADE Programs for Speed - Here Are The Results

By: Tim Bushnell, PhD

In the flow cytometry community, SPADE (Spanning-tree Progression Analysis of Density-normalized Events) is a favored algorithm for dealing with highly multidimensional or otherwise complex datasets. Like tSNE, SPADE extracts information across events in your data unsupervised and presents the result in a unique visual format. Given the growing popularity of this kind of algorithm for dealing with complex datasets, we decided to test the SPADE algorithm in 5 software packages, including Cytobank, FCS Express, FlowJo, R, and the original, free software made available by the author of SPADE. Which was the fastest?

5 FlowJo Hacks To Boost The Quality Of Your Flow Cytometry Analysis

5 FlowJo Hacks To Boost The Quality Of Your Flow Cytometry Analysis

By: Tim Bushnell, PhD

FlowJo is a powerful tool for performing and analyzing flow cytometry experiments, if you know how to use it to the fullest. This includes understanding embedding and using keywords, the FlowJo compensation wizard, spillover spreading matrix, FlowJo and R, and creating tables in FlowJo. Extending your use of FJ using these hacks will help organize your data, improve analysis and make your exported data easier to understand and explain to others. Take a few moments and explore all you can do with FJ beyond just gating populations.

Statistical Challenges Of Rare Event Measurements In Flow Cytometry

Statistical Challenges Of Rare Event Measurements In Flow Cytometry

By: Tim Bushnell, PhD

It is necessary to sort through hundreds of thousands or millions of cells to find the few events of interest. With such low event numbers, we move away from the comfortable domain of the Gaussian distribution and move into the realm of Poisson statistics. There are 3 points to consider to build confidence in the data that the events being counted are truly events of interest and not random events that just happen to fall into the gates of interest.

How to Optimize Flow Cytometry Hardware For Rare Event Analysis

How to Optimize Flow Cytometry Hardware For Rare Event Analysis

By: Tim Bushnell, PhD

Preparing for rare event analysis requires an understanding of the power and limitation of the instrument to be used. From how fast to run the fluidics, to how the signal is processed to the number of gates that can be used in the sorting experiment, each factor impacts the outcome of the experiment.

How To Choose The Correct Antibody For Accurate Flow Cytometry Results

How To Choose The Correct Antibody For Accurate Flow Cytometry Results

By: Tim Bushnell, PhD

With the added emphasis on reproducibility, it is critical to look at every step where experiments can be improved. No single step makes an experiment more reproducible, rather it is a process, making changes at each stage that leads to reproducibility. Antibodies comprise a critical component that needs to be reviewed. As Bradbury et al. in a commentary in Nature pointed out, the global spending on antibodies is about $1.6 billion a year, and it is estimated about half of that money is spent on “bad” antibodies. This does not include the additional costs of wasted time and effort by…

How To Achieve Accurate Flow Cytometry Calcium Flux Measurements

How To Achieve Accurate Flow Cytometry Calcium Flux Measurements

By: Tim Bushnell, PhD

Dyes exist for the detection of everything from large nucleic acids to reactive oxygen species, and from lipid aggregates to small ions. Concentrations of physiologically important ions such as sodium, potassium, and calcium can be important indicators of health and disease. Calcium ions play an especially critical role in cellular signaling. As a signaling messenger, calcium is involved in everything from muscle contractions, to cell motility, to enzyme activity. Calcium experiments can be very informative, and with the advent of cheaper UV lasers, more and more researchers can use ratiometric measurements to evaluate the signaling processes in phenotypically defined populations.

How to Perform Doublet Discrimination In Flow Cytometry

How to Perform Doublet Discrimination In Flow Cytometry

By: Tim Bushnell, PhD

You are probably familiar with the term, “doublet discrimination” or “doublet exclusion”, and have likely included this flow cytometry measurement into at least some (if not all) of your gating strategies. Even though you may utilize this important gating strategy, you may not have had the chance to delve deeper to explore exactly what doublets are and why it’s critical to exclude them. This article aims to give you insight on the what, why, and how of doublet discrimination.

4 Considerations For Assessing Protein Phosphorylation Using Flow Cytometry

4 Considerations For Assessing Protein Phosphorylation Using Flow Cytometry

By: Tim Bushnell, PhD

For those working in the signaling field, having the ability to take a sample and phenotypically identify it, while knowing what is happening inside the cell to the target molecules of choice opens up a host of new opportunities. These assays are amenable to high throughput setup, meaning that biologically relevant outcomes in pre-clinical drug discovery can be measured directly. All told, with a little forethought, some careful planning and validation, and our helpful tips, phosphoflow assays are within your reach.

5 Essential Calculations For Accurate Flow Cytometry Results

5 Essential Calculations For Accurate Flow Cytometry Results

By: Tim Bushnell, PhD

Flow cytometry is a numbers game. There are percentages of a population, fluorescence intensity measurements, sample averages, data normalization, and more. Many of these common calculations are useful, but surrounded by misconceptions. This primer will help you decide which calculation to use, when to use it, and how to interpret the results.

Top Technical Training eBooks

Get the Advanced Microscopy eBook

Get the Advanced Microscopy eBook

Heather Brown-Harding, PhD

Learn the best practices and advanced techniques across the diverse fields of microscopy, including instrumentation, experimental setup, image analysis, figure preparation, and more.

Get The Free Modern Flow Cytometry eBook

Get The Free Modern Flow Cytometry eBook

Tim Bushnell, PhD

Learn the best practices of flow cytometry experimentation, data analysis, figure preparation, antibody panel design, instrumentation and more.

Get The Free 4-10 Compensation eBook

Get The Free 4-10 Compensation eBook

Tim Bushnell, PhD

Advanced 4-10 Color Compensation, Learn strategies for designing advanced antibody compensation panels and how to use your compensation matrix to analyze your experimental data.