2 Key SPADE Parameters To Adjust For Best Flow Cytometry Results

Mass cytometry panels routinely include 30 or more markers, but traditional analysis methods like bivariate gating can’t adequately parse the resulting high-dimensional data.

Spanning-tree progression analysis of density-normalized events (SPADE) is one of the most commonly used computational tools for visualizing and interpreting data sets from mass cytometry and multidimensional fluorescence flow cytometry experiments.

There are two key parameters in SPADE that you can adjust in order get the best results possible: downsampling and target number of nodes, or k. Knowing how to properly set these values will enable you to enhance the quality of your analysis.

Downsampling

Imagine your data as a cloud of points in high dimensional space, where each dimension is one of the measured markers.

Cells that are similar to each other are close to one another in this cloud, just as similar cells fall together on a biaxial gating plot. This means that the cloud contains dense regions where there are groups of similar cells, and more sparsely populated regions where there are few similar cells.

The cells falling around the edges of dense regions will likely be grouped into the larger clusters during analysis, even if some of the sparse regions contain cell subsets that happen to be small but phenotypically distinct.

Downsampling in the SPADE algorithm reduces the density variation across the cloud in order to give more equal weight to small, less dense groups of cells in the clustering process so they won’t get absorbed into the larger, denser regions.

After downsampling, SPADE clusters the data and then upsamples, in order to map the cells that were removed during downsampling, back into the clusters to which they are most similar.

You can adjust the extent of downsampling by changing the percentage or absolute number. The percentage indicates the percentage of cells you want to keep during the downsampling process.

100% downsampling means that 100% of the cells will be kept, and therefore SPADE will not downsample. 5% downsampling means that only 5% of the cells will be kept for clustering. Lowering the downsampling percentage in this way prevents small or rare populations of cells from being lost in the clustering step.

If you set an absolute number, rather than a percentage, SPADE downsamples until this number of cells remains.

If you’re working with a limited number of relatively large populations, like normal blood cells, you can probably safely leave the downsampling percentage set to the default. However, if you are seeking novel populations of cells, or very small populations like stem cells, you should consider setting the downsampling to a lower percentage in order to prevent losing those populations during clustering (Figure 1).

Figure 1. Downsampling removes density variation to determine which regions of the point cloud constitute discrete clusters. A) Initial data cloud in n dimensions shown before and after appropriate downsampling, assuming five real cell populations in the data. B) Clusters determined after too much downsampling. Low density regions are inappropriately considered to be discrete clusters. C) Clusters determined after too little sampling. Only high density regions are considered to be clusters.

An important consideration is that when you set the downsampling percentage very low, you risk focusing on noise in the data. Sparse regions in the high dimensional cloud might be treated as discrete clusters, when in reality, they represent nothing more than noise.

On the other hand, if you set the downsampling percentage too high, or if you don’t downsample at all, you risk overlooking smaller, “real” populations of cells.

Target Number of Nodes (k)

The second parameter that you can adjust in a SPADE analysis is the target number of nodes, or k. This value indicates the number of populations into which you want SPADE to divide the cells.

Keep in mind that this number is a target, not an exact value, so you may notice empty clusters in the final output if SPADE couldn’t find exactly k number of clusters in the data.

A good rule of thumb is to always ask SPADE for more nodes, or clusters, than you expect to find.

Overclustering in this way allows you to identify potentially unexpected subpopulations that are defined by subtle, high-dimensional patterns of marker co-expression (for example, small subpopulations of T cells in normal blood that are defined by subtle differences in their co-expression of several activation markers).

Additionally, overclustering helps you delineate between major populations because when the SPADE tree is populated by more nodes, it is easier to visualize and determine more precisely where one major population, or group of nodes, ends and another begins.

When choosing k, you should consider how many populations you expect to find, the relative size of those populations, and the total number of cells in the data set.

If you ask SPADE for 500 clusters but only have 1,000 cells and 5 major populations, you’ll probably get back lots of empty clusters as well as clusters with only a few cells each.

It’s crucial to consider the biological implications of what you put into SPADE, and what you get back. For example, is a population of T cells that only has 3 out of the 1,000 cells “real” or significant?

Knowing your data and your biological system can help you decide appropriate cut-offs for k values, as well as what population sizes are likely to be biologically valid, versus just noise.

Another consideration is that small subsets resulting from high cluster numbers can also be more unstable, meaning that the cells’ phenotypic similarity is so subtle that another round of clustering might group them differently, and thus you may find that these populations won’t hold up to further computational or experimental scrutiny.

Conversely, setting k too low can cause you to miss smaller populations, as they’re likely to be merged into the larger, denser clusters of cells (Figure 2).

Figure 2. Target number of nodes (k) affects the number of clusters returned by SPADE. A) Initial data point cloud before and after clustering using appropriate target number of nodes, assuming five real populations of cells in the data. Data is properly overclustered, allowing analyst to manually delineate between major populations. B) Clustering result if target number of nodes is too high. Major populations appear to contain multiple clusters, and lower density regions are designated as discrete clusters, containing very few cells. C) Clustering result if target number of nodes is too low. All cells are grouped into a few large clusters, obscuring smaller populations of cells.

Fine-tuning your SPADE analysis requires a k value and downsampling percentage that will identify small, rare cell populations without blowing out noise in the data.

Going forward with your analysis, it’s always crucial to experimentally validate novel populations that you have discovered, using SPADE or other computational methods.

For further reading, see: Qui, et al. Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE. Nature Biotechnology, 2011.

To learn more about 2 Key SPADE Parameters To Adjust For Best Flow Cytometry Results, and to get access to all of our advanced materials including 20 training videos, presentations, workbooks, and private group membership, get on the Flow Cytometry Mastery Class wait list.

Join Expert Cytometry's Mastery Class
Tim Bushnell, PhD
Tim Bushnell, PhD

Tim Bushnell holds a PhD in Biology from the Rensselaer Polytechnic Institute. He is a co-founder of—and didactic mind behind—ExCyte, the world’s leading flow cytometry training company, which organization boasts a veritable library of in-the-lab resources on sequencing, microscopy, and related topics in the life sciences.

Similar Articles

We Tested 5 Major Flow Cytometry SPADE Programs for Speed - Here Are The Results

We Tested 5 Major Flow Cytometry SPADE Programs for Speed - Here Are The Results

By: Tim Bushnell, PhD

In the flow cytometry community, SPADE (Spanning-tree Progression Analysis of Density-normalized Events) is a favored algorithm for dealing with highly multidimensional or otherwise complex datasets. Like tSNE, SPADE extracts information across events in your data unsupervised and presents the result in a unique visual format. Given the growing popularity of this kind of algorithm for dealing with complex datasets, we decided to test the SPADE algorithm in 5 software packages, including Cytobank, FCS Express, FlowJo, R, and the original, free software made available by the author of SPADE. Which was the fastest?

5 FlowJo Hacks To Boost The Quality Of Your Flow Cytometry Analysis

5 FlowJo Hacks To Boost The Quality Of Your Flow Cytometry Analysis

By: Tim Bushnell, PhD

FlowJo is a powerful tool for performing and analyzing flow cytometry experiments, if you know how to use it to the fullest. This includes understanding embedding and using keywords, the FlowJo compensation wizard, spillover spreading matrix, FlowJo and R, and creating tables in FlowJo. Extending your use of FJ using these hacks will help organize your data, improve analysis and make your exported data easier to understand and explain to others. Take a few moments and explore all you can do with FJ beyond just gating populations.

Statistical Challenges Of Rare Event Measurements In Flow Cytometry

Statistical Challenges Of Rare Event Measurements In Flow Cytometry

By: Tim Bushnell, PhD

It is necessary to sort through hundreds of thousands or millions of cells to find the few events of interest. With such low event numbers, we move away from the comfortable domain of the Gaussian distribution and move into the realm of Poisson statistics. There are 3 points to consider to build confidence in the data that the events being counted are truly events of interest and not random events that just happen to fall into the gates of interest.

How to Optimize Flow Cytometry Hardware For Rare Event Analysis

How to Optimize Flow Cytometry Hardware For Rare Event Analysis

By: Tim Bushnell, PhD

Preparing for rare event analysis requires an understanding of the power and limitation of the instrument to be used. From how fast to run the fluidics, to how the signal is processed to the number of gates that can be used in the sorting experiment, each factor impacts the outcome of the experiment.

How To Choose The Correct Antibody For Accurate Flow Cytometry Results

How To Choose The Correct Antibody For Accurate Flow Cytometry Results

By: Tim Bushnell, PhD

With the added emphasis on reproducibility, it is critical to look at every step where experiments can be improved. No single step makes an experiment more reproducible, rather it is a process, making changes at each stage that leads to reproducibility. Antibodies comprise a critical component that needs to be reviewed. As Bradbury et al. in a commentary in Nature pointed out, the global spending on antibodies is about $1.6 billion a year, and it is estimated about half of that money is spent on “bad” antibodies. This does not include the additional costs of wasted time and effort by…

How To Achieve Accurate Flow Cytometry Calcium Flux Measurements

How To Achieve Accurate Flow Cytometry Calcium Flux Measurements

By: Tim Bushnell, PhD

Dyes exist for the detection of everything from large nucleic acids to reactive oxygen species, and from lipid aggregates to small ions. Concentrations of physiologically important ions such as sodium, potassium, and calcium can be important indicators of health and disease. Calcium ions play an especially critical role in cellular signaling. As a signaling messenger, calcium is involved in everything from muscle contractions, to cell motility, to enzyme activity. Calcium experiments can be very informative, and with the advent of cheaper UV lasers, more and more researchers can use ratiometric measurements to evaluate the signaling processes in phenotypically defined populations.

How to Perform Doublet Discrimination In Flow Cytometry

How to Perform Doublet Discrimination In Flow Cytometry

By: Tim Bushnell, PhD

You are probably familiar with the term, “doublet discrimination” or “doublet exclusion”, and have likely included this flow cytometry measurement into at least some (if not all) of your gating strategies. Even though you may utilize this important gating strategy, you may not have had the chance to delve deeper to explore exactly what doublets are and why it’s critical to exclude them. This article aims to give you insight on the what, why, and how of doublet discrimination.

4 Considerations For Assessing Protein Phosphorylation Using Flow Cytometry

4 Considerations For Assessing Protein Phosphorylation Using Flow Cytometry

By: Tim Bushnell, PhD

For those working in the signaling field, having the ability to take a sample and phenotypically identify it, while knowing what is happening inside the cell to the target molecules of choice opens up a host of new opportunities. These assays are amenable to high throughput setup, meaning that biologically relevant outcomes in pre-clinical drug discovery can be measured directly. All told, with a little forethought, some careful planning and validation, and our helpful tips, phosphoflow assays are within your reach.

5 Essential Calculations For Accurate Flow Cytometry Results

5 Essential Calculations For Accurate Flow Cytometry Results

By: Tim Bushnell, PhD

Flow cytometry is a numbers game. There are percentages of a population, fluorescence intensity measurements, sample averages, data normalization, and more. Many of these common calculations are useful, but surrounded by misconceptions. This primer will help you decide which calculation to use, when to use it, and how to interpret the results.

Top Technical Training eBooks

Get the Advanced Microscopy eBook

Get the Advanced Microscopy eBook

Heather Brown-Harding, PhD

Learn the best practices and advanced techniques across the diverse fields of microscopy, including instrumentation, experimental setup, image analysis, figure preparation, and more.

Get The Free Modern Flow Cytometry eBook

Get The Free Modern Flow Cytometry eBook

Tim Bushnell, PhD

Learn the best practices of flow cytometry experimentation, data analysis, figure preparation, antibody panel design, instrumentation and more.

Get The Free 4-10 Compensation eBook

Get The Free 4-10 Compensation eBook

Tim Bushnell, PhD

Advanced 4-10 Color Compensation, Learn strategies for designing advanced antibody compensation panels and how to use your compensation matrix to analyze your experimental data.