The Need For Speed In Flow Cytometry Data Analysis

Speed is a highly touted metric in flow cytometry. Look at any vendor’s website and you will see the highlights on how many events per second their instrument can acquire, how many cells can be sorted per second, and more. The limitations are imposed by the physics of flow cytometry, the speed of pulse processing, and more. With cell sorters, Poisson statistics dominate the speed calculation. As has been discussed before, the optimal sort rate is ¼ the frequency of droplet generation. Sorting faster will impact purity of the final product.

One of the trends in flow cytometry is pushing the limit of the number of parameters that can be measured at one time. The CyTOF threw the gauntlet down to start this new race by changing how the signal was detected. It didn’t take long for fluorescence-based cytometers to begin pushing past the 18-fluorochrome limit, and now instruments that can do 24 or more fluorescent parameters at the same time are available. Spectral cytometry may push this limit to 50 parameters or more in the near future.

With all these parameters, the data files become very large very quickly, and the ability to analyze such complex data becomes increasingly difficult. This has led to the desire to find analytical methods that can reduce the complexity of the data in some way to make it more manageable to find populations of interest. One of the most popular algorithms in flow cytometry circles is the tSNE algorithm. You can read more about it in these articles: van der Maaten and Hinton (2008), van der Maaten (2014), and Amir et al (2013).

tSNE allows for the visualization of high-dimensional data on a single bivariate plot. From these single plots, further analysis can be performed using other analytical techniques. However, the tSNE analysis, although powerful, is very slow and memory-intensive. In order to complete the tSNE algorithm in a reasonable amount of time, most datasets are downsampled.

Downsampling is a process where a smaller number of events is used as representative of the whole sample. This happens all time in our daily lives and generally we don’t notice it. However, if you are a true audiophile, for example, there is a difference between an electronic copy of a piece of music and hearing it from the original source.

When the data is downsampled, there is a probability that rare events will be removed from the data. Since these low frequency events are often the pieces of data the research is most interested in, the larger the sample size that can be processed, the less likely this is to occur.

This brings us back to the need for speed. The goal of our high-dimensional experiments is to identify changes in the experimental system, finding those rare events that allow for a more complete understanding of the biology. It becomes a balancing act between adding more data and keeping the overall analysis time manageable.

There are several commercially available implementations of the tSNE algorithm available on the market. The question becomes, “How fast can each of these implementations perform the tSNE analysis on a standard file, using a typical desktop computer?” In the interest of fairness, you can download the file that was used and the method for running the competition here.

The competitors in this test were: Cytobank™, FCS Express™, and FlowJo®. For those more sophisticated, and as a benchmark, the freely available R implementation of tSNE was also run.

Before the results are revealed and the winner of the first tSNE speed race is named, it is important to understand how the timing was done and the steps in each implementation. These are presented below, in alphabetical order.

Cytobank™ requires uploading the data to the cloud, where it can inform you that your data is in a queue to be processed. The timings below include both the upload and wait time (in these tests, these were under 2 minutes each, for a total of ~4 minutes). The queue waiting time is likely variable, depending on how many other people around the world have samples waiting to be analyzed by tSNE, so your mileage may vary. Cytobank™ does not require a separate downsampling step, as “desired total events” is a setting built into the viSNE (tSNE) module. Thus, the time for downsampling is automatically part of the viSNE (tSNE) calculation time itself.

FCS Express™ does not require a separate downsampling step, as “sample size” is built into the FCSE Express tSNE transformation Tool. Thus, as in Cytobank, the time for downsampling is automatically part of the tSNE calculation time itself.

FlowJo® requires installation of the DownSample plugin. To use this for tSNE analysis, the user must select the number of events to be downsampled (plotted as “sample size” in the graphs below), save the layout, wait for the downsampling to finish, and use the tSNE plugin to calculate tSNE. Downsampling time is reflected in the graph below and was ~20 seconds, regardless of the number of events. Time to save the layout was neglected.

For the tests using R, sample sizes of the original file were generated with a sample-by increment, and Rtsne (available here) was run on the sampled data. As with FlowJo, the total time (i.e., for the separate downsampling step + the time for the tSNE calculation) was graphed.

The methods and timing process are described here, along with the dataset.

Various sample sizes up to at least 300,000 were tested in all 4 software packages. Ungated plots of tSNE calculated on 100,000 events are shown in Figure 1 below. The color scaling and resolution of the FCS Express plot were changed from the default to facilitate comparison with the Cytobank plot, but this was not possible in FlowJo. Also, note that it is the nature of tSNE that results vary with each run, due to the nonlinear dimensionality reduction the algorithm performs. Don’t worry if your plots differ in appearance from those below.

Figure 1: Results of tSNE analysis on 100,000 events in the 4 software implementations. Color is based on CD90 expression (FITC label/FL-1 in the dataset).

The meat of this friendly competition was to determine which of the packages performed the tSNE analysis the fastest. The winner was FCS Express (green), followed by R (purple) and FlowJo (red), with Cytobank (blue) coming in last (Figure. 2). At 50,000 events, Cytobank’s calculation took almost 8 times as long as FCS Express; at 300,000 events, >5x as long.

Figure 2: Results of speed test as a function of sample size.

Let’s break down the head-to-head between FCS Express™ (FCSE) and FlowJo® (FJ), 2 of the most commonly available packages for most researchers. For these packages, the tests were run in triplicate. When using a sample size of 15,000 events, the processing rates were 0.56±0.03 minutes for FCSE, versus 2.53±0.12 min for FJ. FJ is over 4 times slower than FCSE! At 100,000 events, FCSE still had a dramatic lead: 4.74±0.23 minutes versus FJ’s 17.91±0.48 minutes, nearly 4 times slower.

So, why is speed of the algorithm so important? Why worry when you can just set up the analysis and go for lunch? If you’re like me when I’m analyzing data, I like to stay in that mindset. Distractions, like a long break, can impact the train of thought about the analysis. Additionally, with long run-times, it is depressing to return to the data and see the calculation stopped prematurely because of an incorrect parameter or some other error.

More importantly, the tSNE analysis is one part of the process. To fully understand the results and identify the populations of interest requires additional work, including gating and backgating. Having tSNE analysis completed 4 times faster means that much sooner to get to this additional analysis, and that one can analyze 4 times the amount of data with FCSE when compared to FJ.

Another reason that this becomes important is for rare event analysis. To ensure that the rare event population is not lost in the downsample, it is necessary to run a large file. Further, many researchers are analyzing multiple files merged from an experiment, to ensure more accurate and consistent analysis compared to single file analysis. Surprisingly, FJ was unable to complete the tSNE calculation on sample sizes larger than 200,000 events. The other 3 packages were able to complete over a million events — FCSE completed 2 million events in 4.37 hours, and R took 15.51 hours. Cytobank (using the Premium product) was limited to 1.3 million events, and that took 7.57 hours.

In conclusion, this experiment was very illuminating. Some takeaways include that the “cloud” is not necessarily better for analysis. After uploading data to Cytobank, analysis doesn’t tie up the local computer resources, which is a plus. It can also facilitate collaborations. However, it may be less expensive to invest in a more powerful local computer and take advantage of AWS or other cloud-based data storage platforms for sharing the data. If you’re facile, go with R, the free implementation, although it’s much slower than FCSE. For my time (and we know time is money), FCSE is the winner of this speed test.

Remember, if you want to try this for yourself, the data and instructions are found here.

To learn more about The Need For Speed In Flow Cytometry Data Analysis, and to get access to all of our advanced materials including 20 training videos, presentations, workbooks, and private group membership, get on the Flow Cytometry Mastery Class wait list.

Join Expert Cytometry's Mastery Class

ABOUT TIM BUSHNELL, PHD

Tim Bushnell holds a PhD in Biology from the Rensselaer Polytechnic Institute. He is a co-founder of—and didactic mind behind—ExCyte, the world’s leading flow cytometry training company, which organization boasts a veritable library of in-the-lab resources on sequencing, microscopy, and related topics in the life sciences.

Tim Bushnell, PhD

Similar Articles

The Power Of Spectral Viewers And Their Use In Full Spectrum Flow Cytometry

The Power Of Spectral Viewers And Their Use In Full Spectrum Flow Cytometry

By: Tim Bushnell, PhD

What photon from yonder fluorochrome breaks?  It is … umm… hmmm. Let me see. Excitation off a 561 nm laser, with an emission maximum of 692 nm. I’m sure if Shakespeare was a flow cytometrist, he might have written that very scene. But the play is lost in time. However, since the protagonist had difficulty determining what fluorochrome was emitting photons, let’s consider how this could be figured out. In my opinion, one of the handiest flow cytometry tools is the spectral viewer. This tool helps visualize the excitation and emission profile of different fluorochromes, as well as allowing you…

Fickle Markers: Solutions For Antibody Binding Specificity Challenges

Fickle Markers: Solutions For Antibody Binding Specificity Challenges

By: Tim Bushnell, PhD

Reproducibility has been an ongoing, and important, concept in the sciences for years.  In the area of biomedical research, the alarm was sounded by several papers published in the early 2010’s.  Authors like Begley and Ellis, Prinz and coworkers, and Vasilevsky and colleagues, among others reported an alarming trend in the reproducibility of pre-clinical data.  These reports indicated between 50% to almost 90% of published pre-clinical data were not reproducible.  This was further highlighted in the article by Freedman and coworkers, who tried to identify and quantify the different sources of error that could be causing this crisis.  Figure 1,…

5 Common Flow Cytometry Questions, Answered

5 Common Flow Cytometry Questions, Answered

By: Tim Bushnell, PhD

I want to thank all of you who send us your questions about flow cytometry, so I thought I would dip into the old email bag and answer a few of the common ones here.  If your question isn’t answered this time, look for it to be answered in a future blog post.  Of course, if you want us to cover a specific topic, drop us a line.  1. How Fast Can I Go? This is  a common question. The allure of the ‘hi’ button is hard to resist.  The faster you go, the sooner you are finished with data…

Combining Flow Cytometry With Plant Science, Microorganisms, And The Environment

Combining Flow Cytometry With Plant Science, Microorganisms, And The Environment

By: Tim Bushnell, PhD

My first introduction to flow cytometry was talking to a professor who’d brought one on a research cruise to study phytoplankton. It was only later that I was introduced to the marvelous world that’s been my career for over 20 years.   In that time, I’ve had the opportunity to work with researchers in many different areas, exposing me to a wide variety of cell types and more important assays. What continues to amaze me is the number of different parameters we can measure, not just the number of fluorochromes, but the information we can extract from samples – animal, vegetable…

Common Numbers-Based Questions I Get As A Flow Cytometry Core Manager And How To Answer Them

Common Numbers-Based Questions I Get As A Flow Cytometry Core Manager And How To Answer Them

By: Tim Bushnell, PhD

Numbers are all around us.  My personal favorite is ≅1.618 aka ɸ aka ‘the golden ratio’.  It’s found throughout history, where it has influenced architects and artists. We see it in nature, in plants, and it is used in movies to frame shots. It can be approximated by the Fibonacci sequence (another math favorite of mine). However, I have not worked out how to apply this to flow cytometry.  That doesn’t mean numbers aren’t important in flow cytometry. They are central to everything we do, and in this blog, I’m going to flit around numbers-based questions that I have received…

3 Must-Have High-Dimensional Flow Cytometry Controls

3 Must-Have High-Dimensional Flow Cytometry Controls

By: Tim Bushnell, PhD

Developments such as the recent upgrade to the Cytobank analysis platform and the creation of new packages such as Immunocluster are reducing the computational expertise needed to work with high-dimensional flow cytometry datasets. Whether you are a researcher in academia, industry, or government, you may want to take advantage of the reduced barrier to entry to apply high-dimensional flow cytometry in your work. However, you’ll need the right experimental design to access the new transformative insights available through these approaches and avoid wasting the considerable time and money required for performing them. As with all experiments, a good design begins…

The Fluorochrome Less Excited: How To Build A Flow Cytometry Antibody Panel

The Fluorochrome Less Excited: How To Build A Flow Cytometry Antibody Panel

By: Tim Bushnell, PhD

Fluorochrome, antibodies and detectors are important. The journey of a thousand cells starts with a good fluorescent panel. The polychromatic panel is the combination of antibodies and fluorochromes. These will be used during the experiment to answer the biological question of interest. When you only need a few targets, the creation of the panel is relatively straightforward. It’s only when you start to get into more complex panels with multiple fluorochromes that overlap in excitation and emission gets more interesting.  FLUOROCHROMES Both full spectrum and traditional fluorescent flow cytometry rely on measuring the emission of the fluorochromes that are attached…

Flow Cytometry Year in Review: Key Changes To Know

Flow Cytometry Year in Review: Key Changes To Know

By: Meerambika Mishra

Here we are, at the end of an eventful year 2021. But with the promise of a new year 2022 to come. It has been a long year, filled with ups and downs. It is always good to reflect on the past year as we move to the future.  In Memoriam Sir Isaac Newton wrote “If I have seen further, it is by standing upon the shoulders of giants.” In the past year, we have lost some giants of our field including Zbigniew Darzynkiwicz, who contributed much in the areas of cell cycle analysis and apoptosis. Howard Shapiro, known for…

What Star Trek Taught Me About Flow Cytometry

What Star Trek Taught Me About Flow Cytometry

By: Tim Bushnell, PhD

It is no secret that I am a very big fan of the Star Trek franchise. There are many good episodes and lessons explored in the 813+ episodes, 12 movies (and counting). Don’t worry, this blog is not going to review all 813, or even 5 of them. Instead, some of the lessons I have taken away from the show that have applicability to science and flow cytometry.  “Darmok and Jalad at Tanagra.”  (ST:TNG season 5, episode 2) This is probably one of my favorite episodes, which involves Picard and an alien trying to establish a common ground and learn…

Top Industry Career eBooks

Get the Advanced Microscopy eBook

Get the Advanced Microscopy eBook

Heather Brown-Harding, PhD

Learn the best practices and advanced techniques across the diverse fields of microscopy, including instrumentation, experimental setup, image analysis, figure preparation, and more.

Get The Free Modern Flow Cytometry eBook

Get The Free Modern Flow Cytometry eBook

Tim Bushnell, PhD

Learn the best practices of flow cytometry experimentation, data analysis, figure preparation, antibody panel design, instrumentation and more.

Get The Free 4-10 Compensation eBook

Get The Free 4-10 Compensation eBook

Tim Bushnell, PhD

Advanced 4-10 Color Compensation, Learn strategies for designing advanced antibody compensation panels and how to use your compensation matrix to analyze your experimental data.