Common Numbers-Based Questions I Get As A Flow Cytometry Core Manager And How To Answer Them
Numbers are all around us. My personal favorite is ≅1.618 aka ɸ aka ‘the golden ratio’. It’s found throughout history, where it has influenced architects and artists. We see it in nature, in plants, and it is used in movies to frame shots. It can be approximated by the Fibonacci sequence (another math favorite of mine). However, I have not worked out how to apply this to flow cytometry.
That doesn’t mean numbers aren’t important in flow cytometry. They are central to everything we do, and in this blog, I’m going to flit around numbers-based questions that I have received and share my thoughts on them.
Question – “What Voltage Should I Use?”
What makes a good voltage is a question I get on a regular basis. A good voltage should meet the following criteria (thanks to Joe Trotter for discussions on this topic):
- The unstained cells are in a region where electronic noise is n more than 10-20% of the total variance.
- The signals are within the linear dynamic range of the detector
- There is room above the brightest signal in case there is an increase of expression
Gone are the days where we put our negatives in the first decade. We have methods now to characterize the sensitivity of the detector, and many vendors are releasing tools specific for their instrument. For PMT-based systems, I prefer to use the ‘peak 2’ method, which was described in this paper. The results of this analysis are shown in figure 1.
This will give you a minimum starting voltage for your signals. There are several other methods to determine this, including the OEM methods. ThermoFisher published a summary of these methods. The bottom line, however, is that it’s important to identify the minimum starting voltage. Below those levels, the spread of the data (as measured by the rCV) increases, reducing the ability to separate dim from negative.
So, make sure that you know the number to stay above for each detector!
Question – “How Many Controls Do I Really Need?”
Again, relying on the classic business school answer, ‘it depends.’ The question is how to determine how many controls are needed to properly interpret your data? Compensation controls are a must to ensure proper compensation, so that is one for each fluorochrome in the panel. An unstained control is nice as well, but not absolutely essential, especially if you have a positive and negative population in each comp tube.
Quality controls are also important. These can be separated into a bead-based control that is used to track voltage and instrument performance over time, and a standard ‘reference’ control that is used to show the assay is working properly each time it’s run. A quick reminder that when using quality controls, it’s not sufficient to just run them with each experiment, but you need to track the performance over time. As the old adage says, “If you don’t write it down, it didn’t happen.”
Gating controls are another important class of controls to run. These will help properly set the gates so that you can find your populations of interest. The most famous of these controls in the FMO, or fluorescence minus one, control. A lot has been written about this control and its importance cannot be understated, especially when trying to gate rare cells and emergent markers or for any population that requires the FMO to identify properly.
Less well known, but equally important include the isoclonal control, which is useful in identifying if a cell is binding the fluorochrome. This has been shown to occur in multiple publications as shown, for example here and here. As an aside, this article offers some suggestions on blocking this phenomena. The internal negative control is another useful gating control that takes advantage of populations in your sample that are known to not express your marker of interest. Last, but by no means least, is the stimulation control, which is discussed in this paper. It is very useful when running stimulated samples, and is used in conjunction with the FMO control.
You will notice I don’t recommend the isotype control. I won’t go into details here, but you can read about the strengths and weaknesses of the isotype control here and decide for yourself its value (or not).
As you can see, there are a lot of controls needed for a good flow cytometry experiment. Remember that the controls are the lifeblood of science, and without hte controls, interpreting the results of the experimental samples may be difficult at best to impossible at worst.
How do you determine which control(s) are necessary? That is something that should occur doing your optimization phase of panel development. As you work to determine what are the best conditions for staining your cells, work through the process with every control you can think of. Using those controls, you can develop your analysis template, and determine which controls are necessary to identify the target cells of interest. That way, when you move to validation, you have a handle on what is necessary to interpret the data.
Question – “Mean, Mode, Median, Which Should I Use?”
When analyzing flow cytometry data, the first step is to display the data in some form or another. Most commonly this involves univariant and bivariant plots, where regions of interests (i.e. gates) are used to reduce the data to find the target cells. Once those are identified, data are extracted. This could be the percentage of cells of interest or it could be the expression level of a marker of interest.
The question is what to do with those data? Plots are a nice way to show the data, but they don’t get to the meat of the matter. Does the data support the conclusion the researcher is suggesting? That requires that we take multiple experiments and compare them in some statistical manner. This means we need to describe the populations mathematically.
To describe the data, we can look at two numbers – the central tendency of the data and the spread of the data. These two numbers give you a mathematical picture of what the data are doing.
Describing the central tendency, there are three commonly used measures – the mean, the median, and the mode. The mean can be measured as the arithmetic or geometric mean . Both assume a normal distribution and can be influenced by outliers and skewed data. The median is the central number in the dataset, doesn’t assume a model and is less influenced by skewed data. The mode represents the most common data point in the dataset.
For the most part, flow cytometry focuses on using the mean or the median to describe the central tendency of a dataset. You have seen MFI as an abbreviation, don’t assume you know what it means, make sure you can find that in the paper, and make sure you define it for your readers. In the case of expression level data, the median is the most common measure of the central tendency because it’s more robust than the mean.
The second number is the spread of the data, which can be expressed in several ways. The most common include the standard deviation, the ‘robust’ standard deviation or the coefficient of variance. The standard deviation is the square root of the variance of the data. Sample variance can be calculated as: . Standard deviation (𝜎) is just the square root of the variance. The coefficient of variance, or the ‘normalized standard deviation’ is . The 𝜎 and CV are generally used with the mean data. To describe the spread of the data using the median, we turn to the MAD, the rSD and the rCV. The median absolute deviation is equal to the Median . The robust SD and the robust CV is equal to . Again, the robust measures (median, rCV and rSD) are more resistant to outliers.
At the end of the day, whatever measure you use, make sure to define it in the paper.
Flow Cytometry is full of numbers. From the number of cells needed for an experiment, to the data extracted to be used for statistical analysis, there is a reliance on understanding what the numbers are, what they mean, and how they are derived. Understanding and communicating this information is an essential component to continue to improve the reproducibility of your data. In conclusion, let me leave you with a quote from Carl Sagan “I said there are maybe 100 billion galaxies and 10 billion trillion stars. It’s hard to talk about the Cosmos without using big numbers… But I never said ‘billions and billions.’ For one thing, it’s too imprecise.” In flow, let’s make sure we are precise with our numbers.
To learn more about important control measures for your flow cytometry lab, and to get access to all of our advanced materials including 20 training videos, presentations, workbooks, and private group membership, get on the Flow Cytometry Mastery Class wait list.