6 Tips For Applying The Right Statistical Test To Your Flow Cytometry Data

Written by Tim Bushnell, PhD

Flow cytometry data are numbers rich.

Data from experiments can be population measurements (percent of CD4+ cells, for example), or it can be expression level (median fluorescent expression of CD69 on activated T cells).

Many times, researchers are content to show histograms to illustrate their point after a flow experiment. This approach misses the opportunity to take that content rich data and extend the analysis into a statistical analysis.

To properly perform statistical analysis, the first step is to understand the hypothesis. The hypothesis will guide the statistical analysis, identifying the correct test to be performed. There are several things that need to be considered when beginning the statistical analysis of the data.

1. Design your experiment properly from the start.

Statistical power answers the question of what is the probability of correctly rejecting the null hypothesis when the null hypothesis falls. There are three factors that influence the power of an experiment: the sample size, the spread of the data and the number of replicates. The power of the experiment is related to the ability of the experiment to avoid statistical errors.

2. Know the classes of statistical errors and how to avoid them.

False positives (Type I errors) are when a true null hypothesis is incorrectly rejected. False negatives (Type II errors) are when the test fails to reject a false null hypothesis.

In fact, the power of the experiment is defined as the b which is equal to the True positive/(true positive + false negative)

3. Use the appropriate statistical test.

The biological hypothesis and experimental design will determine what is the appropriate test for the data. The distribution of the data is also important to consider. How best to determine the correct test? This table can help you determine which test is most appropriate.

4. Set the appropriate threshold.

The a value is the threshold that will be used to determine in the data is statistically significant or not. For historical reasons, this value is usually set at 0.05. This can be interpreted as the chance of finding significance where there is none (i.e. The chance of committing a Type I error).

**5. Avoid the more significant trap.**

Once the a value is set, if the P-value is below that value, the data is statistically significant. The data is not more significant if the P-value is 0.01 and the threshold is 0.05 than if the P-value is 0.04. If there is an expectation, and a desire to decrease the Type I error, the threshold should be set to a more stringent level (0.01 or more).

6. Avoid multiple pairwise comparisons.

In the case where the experimental design has Drug X, Drug Y and the combination of Drug X and Y, to be compared to an untreated sample, what is the best test? Pairwise comparisons should not be performed in this case for the following reason. With the a set to 0.05, there is a 5% change of committing a Type one error. With each comparison, the change of committing a Type I error increases, as showing in the chart below.

Number of pairwise comparisons	Changes of a Type I error
2	10%
3	15%
4	19%
5	23%

At the end of the day, the statistical analysis of your flow cytometry data is a critical step for proving the validity of the hypothesis that was being tested. With careful and considered approach to performing the correct testing, the published data will stand up to the rigors of peer review and help lead to another discovery.

ABOUT TIM BUSHNELL, PHD

Tim Bushnell holds a PhD in Biology from the Rensselaer Polytechnic Institute. He is a co-founder of—and didactic mind behind—ExCyte, the world’s leading flow cytometry training company, which organization boasts a veritable library of in-the-lab resources on sequencing, microscopy, and related topics in the life sciences.

More Written by Tim Bushnell, PhD

Common Numbers-Based Questions I Get As A Flow Cytometry Core Manager And How To Answer Them

By: Tim Bushnell, PhD

Numbers are all around us. My personal favorite is ≅1.618 aka ɸ aka ‘the golden ratio’. It’s found throughout history, where it has influenced architects and artists. We see it in nature, in plants, and it is used in movies to frame shots. It can be approximated by the Fibonacci sequence (another math favorite of mine). However, I have not worked out how to apply this to flow cytometry. That doesn’t mean numbers aren’t important in flow cytometry. They are central to everything we do, and in this blog, I’m going to flit around numbers-based questions that I have received…

Read Article

How To Do Variant Calling From RNASeq NGS Data

By: Deepak Kumar, PhD

Developing variant calling and analysis pipelines for NGS sequenced data have become a norm in clinical labs. These pipelines include a strategic integration of several tools and techniques to identify molecular and structural variants. That eventually helps in the apt variant annotation and interpretation. This blog will delve into the concepts and intricacies of developing a “variant calling” pipeline using GATK. “Variant calling” can also be performed using tools other than GATK, such as FREEBAYES and SAMTOOLS. In this blog, I will walk you through variant calling methods on Illumina germline RNASeq data. In the steps, wherever required, I will…

Read Article

Understanding Clinical Trials And Drug Development As A Research Scientist

By: Deepak Kumar, PhD

Clinical trials are studies designed to test the novel methods of diagnosing and treating health conditions – by observing the outcomes of human subjects under experimental conditions. These are interventional studies that are performed under stringent clinical laboratory settings. Contrariwise, non-interventional studies are performed outside the clinical trial settings that provide researchers an opportunity to monitor the effect of drugs in real-life situations. Non-interventional trials are also termed observational studies as they include post-marketing surveillance studies (PMS) and post-authorization safety studies (PASS). Clinical trials are preferred for testing newly developed drugs since interventional studies are conducted in a highly monitored…

Read Article

How To Profile DNA And RNA Expression Using Next Generation Sequencing (Part-2)

By: Deepak Kumar, PhD

In the first blog of this series, we explored the power of sequencing the genome at various levels. We also dealt with how the characterization of the RNA expression levels helps us to understand the changes at the genome level. These changes impact the downstream expression of the target genes. In this blog, we will explore how NGS sequencing can help us comprehend DNA modification that affect the expression pattern of the given genes (epigenetic profiling) as well as characterizing the DNA-protein interactions that allow for the identification of genes that may be regulated by a given protein. DNA Methylation Profiling…

Read Article

How To Profile DNA And RNA Expression Using Next Generation Sequencing

By: Deepak Kumar, PhD

Why is Next Generation Sequencing so powerful to explore and answer both clinical and research questions. With the ability to sequence whole genomes, identifying novel changes between individuals, to exploring what RNA sequences are being expressed, or to examine DNA modifications and protein-DNA interactions occurring that can help researchers better understand the complex regulation of transcription. This, in turn, allows them to characterize changes during different disease states, which can suggest a way to treat said disease. Over the next two blogs, I will highlight these different methods along with illustrating how these can help clinical diagnostics as well as…

Read Article

What Is Next Generation Sequencing (NGS) And How Is It Used In Drug Development

By: Deepak Kumar, PhD

NGS methodologies have been used to produce high-throughput sequence data. These data with appropriate computational analyses facilitate variant identification and prove to be extremely valuable in pharmaceutical industries and clinical practice for developing drug molecules inhibiting disease progression. Thus, by providing a comprehensive profile of an individual’s variome — particularly that of clinical relevance consisting of pathogenic variants — NGS helps in determining new disease genes. The information thus obtained on genetic variations and the target disease genes can be used by the Pharma companies to develop drugs impeding these variants and their disease-causing effect. However simple this may allude…

Read Article

7 Key Image Analysis Terms For New Microscopist

By: Heather Brown-Harding, PhD

As scientists, we need to perform image analysis after we’ve acquired images in the microscope, otherwise, we have just a pretty picture and not data. The vocabulary for image processing and analysis can be a little intimidating to those new to the field. Therefore, in this blog, I’m going to break down 7 terms that are key when post-processing of images. 1. RGB Image Images acquired during microscopy can be grouped into two main categories. Either monochrome (that can be multichannel) or “RGB.” RGB stands for red, green, blue – the primary colors of light. The cameras in our phones…

Read Article

We Tested 5 Major Flow Cytometry SPADE Programs for Speed - Here Are The Results

By: Tim Bushnell, PhD

In the flow cytometry community, SPADE (Spanning-tree Progression Analysis of Density-normalized Events) is a favored algorithm for dealing with highly multidimensional or otherwise complex datasets. Like tSNE, SPADE extracts information across events in your data unsupervised and presents the result in a unique visual format. Given the growing popularity of this kind of algorithm for dealing with complex datasets, we decided to test the SPADE algorithm in 5 software packages, including Cytobank, FCS Express, FlowJo, R, and the original, free software made available by the author of SPADE. Which was the fastest?

Read Article

5 FlowJo Hacks To Boost The Quality Of Your Flow Cytometry Analysis

By: Tim Bushnell, PhD

FlowJo is a powerful tool for performing and analyzing flow cytometry experiments, if you know how to use it to the fullest. This includes understanding embedding and using keywords, the FlowJo compensation wizard, spillover spreading matrix, FlowJo and R, and creating tables in FlowJo. Extending your use of FJ using these hacks will help organize your data, improve analysis and make your exported data easier to understand and explain to others. Take a few moments and explore all you can do with FJ beyond just gating populations.

Read Article

See More Articles

Top Industry Career eBooks

Get the Advanced Microscopy eBook

Heather Brown-Harding, PhD

Learn the best practices and advanced techniques across the diverse fields of microscopy, including instrumentation, experimental setup, image analysis, figure preparation, and more.

Learn More

Get The Free Modern Flow Cytometry eBook

Tim Bushnell, PhD

Learn the best practices of flow cytometry experimentation, data analysis, figure preparation, antibody panel design, instrumentation and more.

Learn More

Get The Free 4-10 Compensation eBook

Tim Bushnell, PhD

Advanced 4-10 Color Compensation, Learn strategies for designing advanced antibody compensation panels and how to use your compensation matrix to analyze your experimental data.

Learn More

See All eBooks

6 Tips For Applying The Right Statistical Test To Your Flow Cytometry Data

1. Design your experiment properly from the start.

2. Know the classes of statistical errors and how to avoid them.

3. Use the appropriate statistical test.

4. Set the appropriate threshold.

5. Avoid the more significant trap.

6. Avoid multiple pairwise comparisons.

ABOUT TIM BUSHNELL, PHD

Similar Articles

By: Tim Bushnell, PhD

By: Deepak Kumar, PhD

By: Deepak Kumar, PhD

By: Deepak Kumar, PhD

By: Deepak Kumar, PhD

By: Deepak Kumar, PhD

By: Heather Brown-Harding, PhD

By: Tim Bushnell, PhD

By: Tim Bushnell, PhD

Top Industry Career eBooks

Heather Brown-Harding, PhD

Tim Bushnell, PhD

Tim Bushnell, PhD

**5. Avoid the more significant trap.**