Flow Cytometry Statistics
Understanding statistics and fow cytometry statistical analysis is critical to understanding flow cytometry data.
One of the powers of flow cytometry is the fact that we generate large amounts of data that are amenable to statistical analysis of our populations of interest. Using the standard set of statistical analysis tools allows for hypothesis testing and ultimately determining if there is statistical significance in the datasets.
There are two basic classes of questions that are typically asked in flow cytometry. The first class relate to changes in the number or percent of a specific population upon treatment or disease state. A hypothesis in this class might look like this:
Case 1: In patients suffering from Bowden’s Malady, treatment with Pescaline D causes no change in the percentage of CD86+ memory T cells.
The second class of questions asked in flow cytometry relate to the changes in expression of a given antigen upon treatment or disease state. A hypothesis in this class might be phrased as:
Case 2: In patients suffering from Bowden’s Malady, treatment with Pescaline D causes no change to the expression Interferon gamma on CD86+ memory T cells .
Once the question is determined, an appropriate experimental would be performed, with sufficient replicates (as determined by a power calculation), the correct data can be properly extracted for statistical analysis.
In Case 1, the data would be the percent of CD86+ memory T cells in patients with Bowden’s Malady +/- treatment. This data would be compared using a T-test to determine significance. To perform the T-test, the investigator would need to define the threshold (the a value), and calculate the P value.
When P <a – reject the null hypothesis and the difference is ‘statistically significant’
When P>Y – can’t reject the null hypothesis, and the difference is ‘not statistically significant’
In Case 2, the data that needs to be extracted is the central tendency of the expression of Interferon gamma on the CD86+ memory T cells. This is best represented as the Median Fluorescent Intensity (MFI). Additionally, the robust Standard Deviation (rSD) should be calculated, as it measures the spread of the data around the Median.
Before you move to hypothesis testing, it is often best to convert this data to a fold over background, or resolution metric (RD)value. This is especially important when performing multiple experiments.
The RD is better as it accounts for the spread of the data, not just the separation between experimental and control.
RD = Medianexp – Medianctl
rSDexp + rSDctl
Once the RD is calculated, you can move to hypothesis testing using a T Test against a hypothetical mean. In this case, the hypothetical mean would be 0. Again, the investigator would need to define the threshold (the a value), and calculate the P value.
The caveat for the T-Test is that the data follows a Gaussian distribution. If you do not have Gaussian distributed data, there are similar non-parametric tests that can be performed. They will result a P value being reported and identification of statistical significance.
These basic pair-wise comparison tests allow for determination of statistical significance in two populations. If you have more than two populations, or more complex questions, there are additional statistical tools that can be used, such as regression analysis and ANOVA analysis.
ABOUT TIM BUSHNELL, PHD
Tim Bushnell holds a PhD in Biology from the Rensselaer Polytechnic Institute. He is a co-founder of—and didactic mind behind—ExCyte, the world’s leading flow cytometry training company, which organization boasts a veritable library of in-the-lab resources on sequencing, microscopy, and related topics in the life sciences.More Written by Tim Bushnell, PhD