We Tested 5 Major Flow Cytometry SPADE Programs for Speed – Here Are The Results

As a follow-up to our post on tSNE where we compared the speed of calculation in leading software packages, let’s consider the case of SPADE (Spanning-tree Progression Analysis of Density-normalized Events). A favored algorithm in the flow cytometry community, SPADE is used when dealing with highly multidimensional or otherwise complex datasets. Like tSNE, SPADE extracts information across events in your data unsupervised and presents the result in a unique visual format.

Unlike tSNE, which is a dimensionality-reduction algorithm that presents a multidimensional dataset in 2 dimensions (tSNE-1 and tSNE-2), SPADE is a clustering and graph-layout algorithm. The result is quite different from the two-dimensional plot of tSNE, and rather resembles a phylogenetic tree in its branching structure (Fig. 1).

As with a phylogenetic tree, similar clusters are grouped closer together, and dissimilar clusters are located more distally on the tree. If you’d like to read more on the theory and development of the SPADE algorithm, see the original literature [Qiu P., et al., Nat. Biotech. (2011); Qiu, P., PLoS One (2012)].

A speed test for flow cytometry SPADE programs

Figure 1 SPADE trees with 100 clusters colored on the same parameter (TCR-beta) in five different software.

Having already argued for the need for speed, and given the growing popularity of these kinds of algorithms for dealing with complex datasets, we figured it would be of interest to the flow cytometry community to undertake a similar test across popular software packages.

It was assumed this would be easy enough to do – we were wrong. Of the five software packages we tested (Cytobank, FCS Express, FlowJo, R, and the original, free software made available by the author of SPADE), only Cytobank and FCS Express were able to reliably return results from various FCS files – which is to say, they reliably returned some results while the others returned nothing at all!

Peng Qiu has generously made his original program, SPADE3, freely available to the flow community. However, he notes outright that it cannot necessarily handle the plethora of different formats produced by all available cytometers. Hiccups like these are to be expected of non-commercial software, and in any case, it’s hard to argue with free software of this caliber.

Plus, Peng invites you to contact him for help if your data isn’t recognized by his software, which might be worth a try considering that SPADE3 is surprisingly attractive and easy to use – albeit with limited functionality compared to paid-for, full-suite cytometry data analysis packages.

Commercial flow cytometry software is naturally held to a higher standard, especially if it touts the ability to run SPADE out of the box on your files. SPADE within Cytobank and FCS Express are integrated with the general user interface, and require no plugins or separate installations.

Before we talk results, let’s start by covering certain challenges regarding some of the aforementioned software and how they factored into the speed testing process.

Testing challenges

Despite its commercial status, FlowJo was one headache after another, starting with getting the SPADE implementation working. For SPADE, FlowJo requires the installation of “R” on your workstation, and must be told via its Preferences where this R installation is on your computer – in our case, this was tricky.

Also required is the installation of the SPADE package for R developed by the Nolan lab, which the Nolan lab no longer supports distribution via the Comprehensive R Archive Network (CRAN) or Bioconductor. Perhaps for these reasons, although links to R and the SPADE package are provided on the FlowJo website, some troubleshooting of the installation may be necessary.

Unfortunately, once those hurdles were cleared, it still wasn’t smooth sailing. FlowJo failed to run the SPADE algorithm on:

  • 3 of 5 Becton Dickinson (BD) FACSDiva files,
  • 1 of 1 Beckman Coulter (BC) Cytoflex file,
  • 1 of 2 Fluidigm CYTOF files,
  • 1 of 1 BC Gallios file (the same file we successfully ran tSNE on in our recent tests)
  • 1 of 1 BC Summit file

In each case, FlowJo returned cryptic error messages e.g., “Could not create Gating ML elements” and “the algorithm did not generate a CSV result file.” And this despite the fact that we were attempting to calculate SPADE on ungated data. None of the datafiles we tested were more than a couple of years old – many were from 2018 and verified as exports from updated acquisition software packages, so it’s hard to imagine that defunct formats were an issue. To be fair, we ran a few of these files in R, which also resulted in error messages – so the fault may originate there.

Ultimately, with regard to whether SPADE will actually work for your data in FlowJo, time and effort spent to get the system working will essentially result in a crapshoot. Nonetheless, the speed tests had to go on, as some of you may have files that do work in all five software packages without issue. We finally lucked out and found such a datafile, a 27-MB, 500,000-event, 14-parameter anonymized file provided on Peng Qiu’s SPADE website; full methods available here.

Downsampling

Before we reveal the results, we have to preface them with another discussion of downsampling. As in the case of tSNE, the type and degree of downsampling affects your results (as well as your ability to get those results with realistically attainable computing power) and even accounts for much of the calculation time. So let’s discuss the downsampling choices provided by each software package in alphabetical order.

Cytobank

Cytobank’s downsampling is stated to be density-dependent, and weighted to preferentially include rare events. We could not find out how strong this weight is, numerically speaking, nor did it appear to be user-variable. Downsampling is also performed to a target number of events or percentage of the original file(s), and these values are controllable by the user.

FlowJo

To put it bluntly, the nature of downsampling in FlowJo is both opaque and inflexible, unless perhaps you are something of an R maven; we are not. Moreover, it took some detective work well outside of the FlowJo documentation (and within the exported files from FlowJo) to verify that the defaults of the Nolan SPADE package were being employed, i.e. downsampling to 10% of the original number of cells and throwing out 1% of local low-density outliers. In our experience, changing these default percentages would require a level of R competency not possessed by the typical flow cytometrist.

FCS Express

FCS Express provides several downsampling options, including:

  • None
  • Interval
  • Target
  • Weighted

All of these options are fully defined in the documentation for FCS Express, and these and other related attributes can be numerically specified by the user. It is worth noting that FCS Express allows for the exclusion of low or high-local density events (or both or neither); it does not presuppose an interest in rare or abundant populations, but lets the user decide “if” and “how much.”

The user may also specify a minimum percent of cells defining a cluster so that essentially bogus clusters (in terms of representing so few cells as to be of neither biological nor statistical importance) do not confound data interpretation.

R

For those R mavens who enjoy running SPADE directly within R, it may be possible to change the nature of the downsampling (e.g., to choose to exclude or not exclude both low- and high-density outliers) in the SPADE package within R, as these settings were well documented by the Nolan lab when the SPADE package was created.

That is something we did not attempt, as we are not experts in R. If you are reasonably confident in your R skills, it should be possible to at least change the default percentages for the downsampling mentioned with respect to FlowJo above. Again, we did not attempt to stray from the defaults mentioned above, so the forthcoming comparison between R and FlowJo is strictly “apples-to-apples.”

SPADE3

Peng Qiu’s SPADE3 allows for a user-defined local density of events to be excluded as noise, along with a user-defined local density of cells or number of events corresponding to the rare populations intended to be captured.

Table 1 lists the different options tested in FCS Express. FCSE-A and FCSE-C are most similar to the downsampling parameters for FlowJo and Cytobank.

Table 1

 FCSE SPADE Options
 Downsample (sample size)Downsampling MethodSample SizeLocal Density Method
FCSE-A50,000TargetSpecifiedExact
FCSE-B TargetNot specifiedExact
FCSE-C50,000TargetSpecifiedKernel
FCSE-D TargetNot SpecifiedKernel
FCSE-ENot Downsampled

Without further ado, here are the results of the speed tests (Fig. 2), performed in triplicate.

Flow cytometry speed tests on different SPADE software packages

Figure 2: Results of SPADE Speed tests using 5 different software packages. Test file from Peng Qiu’s SPADE website (27-MB, 500,000 event, 14-parameter anonymized file)

Results

In discussing these results, let’s first note the obvious: The SPADE calculation took almost exactly the same amount of time in R as it did in FlowJo (~ 12 minutes (min)) This was expected, considering that Flowjo makes use of R. Interestingly, at least at this modest sample size, Cytobank took nearly the same amount of time (~12 min) as FlowJo and R.

The time required to upload samples to the cloud, usually a few minutes, was included as in our previous tests for tSNE; the higher deviation between replicates is due to the queue time for the task, which presumably varies according to server load. Peng Qiu’s SPADE3 software took the longest (~25 min), but again, considering that this is a freely available, non-commercial software, the fact that it took only about twice as long as the aforementioned software is actually quite respectable. It is included in this speed test primarily as a benchmark, as it represents the original implementation of the SPADE algorithm.

As we have seen with tSNE, FCS Express handily outperformed the competition. If we consider the Exact method for calculating local density (see the FCS Express manual for details), FCS was over 900% faster (at 1.3 min) than FlowJo or Cytobank; using the Kernel method, it was over 4600% faster (at 0.26 min).

It is difficult to determine which is the more precise comparator because the documentation for the methods employed by FlowJo and Cytobank does not specify, but our money is on Kernel. It’s entirely possible that the full details reside in cited literature, but we only consulted the product manuals for each software package.

What about those other parameters that were tested? The results are shown below in Figure 3.

FCS Express speed tests using different parameters

Figure 3: Results of different parameters tested in FCS Express. Notice the scale is 0 to 4 minutes compared to the 0 to 30 minute scale in Figure 2.

The results speak for themselves. How would these figures hold up for larger sample sizes? Due to the inability to run most datafiles in FlowJo or R, we only performed further tests between Cytobank and FCS Express. We used a 100-MB, 1.3-million-event, 20-parameter anonymized file produced by BD FACS Diva ver 8.0.1, and downsampled to 1 million events. Again, the full methods are available in the download.

Mil events in a flow cytometer speed test

Figure 3 1.3 mil event file downsampled to 1 mil events in FCSE vs cytobank

Again, FCS Express was faster than Cytobank, but this time, Cytobank was only 23% slower than FCS Express (60 vs. 49 min, respectively), in spite of the time required for uploading the file to the Cytobank servers and waiting in the queue for the task to actually begin (~15 min combined). While this suggests that Cytobank is handling large sample sizes efficiently, it also gives rise to the question: Why has the speed difference between the two software, formerly orders of magnitude apart, diminished so?

It’s hard to say, but it may have something to do with differences in upsampling. You may recall from our post about tSNE that downsampling, though necessary for obtaining results with a realistic amount of computing power, can result in the loss of rare populations. While both Cytobank and FCS Express can report statistics for the entire file (including unsampled events), Cytobank does not actually graph all events in the SPADE tree. The settings explicitly state that only 50,000 cells are randomly chosen for the graph.

Spade Plot Integration

Before we detail how all of this can affect you, let’s put it into the proper context by addressing the integration of SPADE plots with the rest of your analysis in the various software. Rather than going in alphabetical order, let’s start with the least integrated and end with the most integrated.

FlowJo and R

FlowJo and R are by far the least integrated. The results of a SPADE analysis in FlowJo and R consist of plots for all parameters exported as html and/or PDF, on which nothing (including the color scales in the legend, which are specified within the SPADE package in R), can be adjusted within the FlowJo workspace. That’s because these plots are not part of the live FlowJo workspace.

Speaking now only of FlowJo, there is interaction of neither the original FCS file, nor the SPADE plot, with the rest of the workspace. That is to say, you cannot apply a 2D-gate from a regular plot to the SPADE plot, nor can you create a gate on the SPADE plot and backgate it onto other 2D-plots or histograms.

There are not even any per-SPADE clusters (number of event statistics) in the workspace where the list of files is found. It’s possible that this is a transient issue, but to obtain any per-cluster no. of event or parametric statistics from the SPADE plot, you have two options:

  1. Import the “clusters.fcs” file exported by FlowJo (which has added an additional parameter that effectively sorts the cells by cluster) back into FlowJo, and draw gates on those populations on a 2D plot around the clusters of interest.
  2. Look at the statistic-containing .csv files that are exported by the software in a zipped file.

It should be noted that in the second case, these CSV files are of limited utility due to the complete lack of live interplay between the statistics and the workspace. We should also note that FlowJo suggests viewing the exported GML files in the Java-dependent, free, open-source Cytoscape software for greater interplay between 2D plots and the SPADE trees that you can generate and edit there.

However, current versions of Cytoscape no longer support the SPADE plugin. I suppose you could try installing the older, recommended version from 2011, but this version is no longer supported, and we are not certain whether it is compatible with updated versions of Java.

SPADE3

Peng Qiu’s SPADE3 software does a little better. The SPADE plots update live when you change the coloring parameter or the scale. The latter has formattable endpoints, so you can make them uniform across plots to facilitate visual comparison between files or parameters. Below the SPADE tree are smaller 2D plots that display the downsample-enriched population in any desired 2D-parametric space. However, as is the case with FlowJo, there is no backgating between 2D plots and the SPADE plot.

This is not surprising, as SPADE3 does not claim to be a full-service cytometry package; in fact, the SPADE3 manual explains that any preliminary gating must be done in another program, and the gated events exported for use in SPADE3. Nonetheless, the SPADE3 program is worth a look if it recognizes your files. There are some nice creative features such as “autosuggest annotation”, which can simplify the process of breaking your SPADE tree down into related groups (similar to clades in phylogenetics).

Cytobank

Not surprisingly, Cytobank’s integration of SPADE with the rest of your analysis is pretty good, given that SPADE is a major draw. The SPADE plots update quickly in response to changing the coloring parameter or scale. Surprisingly, you cannot specify the endpoints of the scale as you can in SPADE3 or FCS Express, but there is a way to standardize the scale across all files in an experiment. You can also choose between “symmetric” and “asymmetric” color scaling.

The parametric statistics on clusters or pooled clusters are easy to obtain within the live interface or via export. However, the interplay between the SPADE plot and the regular 2D plot at the side is not bidirectional – although you can use the “bubbles” functionality to backgate/apply a SPADE gate onto the 2D plot, you cannot create a gate on the 2D plot and apply that gate to the SPADE plot.

It’s thus impossible to, say, manually draw a gate on your FoxP3+ T cells, or on a more mysterious subset within your 2D plots, and see which cluster(s) they fall into on your SPADE plot. For software this popular for SPADE, this is a surprising deficiency. And now we’ll come back to our point from a few paragraphs ago, i.e., that only 50,000 cells are randomly chosen to graph in the SPADE plot.

Aside from the fact that we cannot even draw a gate on rare events in the 2D plot and verify that they are included within the SPADE tree, how do we even know that this randomly selected subset of cells is going to provide the most faithful SPADE tree representation to begin with? I can’t help but think that this method of plotting, though expedient, at least partially defeats the purpose of using large sample sizes (i.e. downsampling within reason).

FCS Express

Lastly, we come to FCS Express. Here, the integration of the SPADE plots with the rest of your plots and statistics is efficient and seamless. Like any other plot type, statistics on a per-cluster or per-cluster gate (using the “well gates” feature in FCS Express) basis are easily obtained within the layout or via export.

As with SPADE3 and Cytobank, you can change the coloring parameter or scale, and the SPADE plots update live. Notably, as in SPADE3, the color scaling can not only be made uniform across plots showing different parameters or files, but the endpoints can be specified. There are additional formatting options including percentile, mean +/- SD, and fixed range log/linear.

These options are the most comprehensive among all software tested, so you should be able to obtain the SPADE trees you need to make your point. The interplay between regular 1D- or 2D-plots and the SPADE tree is fully bidirectional (gates on either can be applied or backgated onto the other like any other gate). In our opinion, this should be standard.

Consider at the very least how, despite this being an “unsupervised” method, at the early stages of implementing a SPADE analysis for a given experiment type, correlation with at least a few high-level manual gates is helpful in evaluating tree architecture. Finally, because 100% of the cells within your file are upsampled for plotting in the SPADE tree, you can be assured that all events, no matter how rare, are represented there.

Conclusion

In summary, the fastest software (FCS Express) was also the most replete with features and integration, but the slowest software (Peng Qiu’s original implementation in the freely available SPADE3) was not the most lacking in those regards – an interesting result to say the least.

Thanks for staying with us through the end. What started out as a supposedly simple speed comparison became a bit more involved due to unexpected differences among the software in terms of datafile compatibility and user interface. We hope we’ve inspired you to check out these and other differences on your own– as usual, we’d love to hear back from you with what you discover.

To learn more about We Tested 5 Major Flow Cytometry SPADE Programs for Speed – Here Are The Results, and to get access to all of our advanced materials including 20 training videos, presentations, workbooks, and private group membership, get on the Flow Cytometry Mastery Class wait list.

Join Expert Cytometry's Mastery Class

ABOUT TIM BUSHNELL, PHD

Tim Bushnell holds a PhD in Biology from the Rensselaer Polytechnic Institute. He is a co-founder of—and didactic mind behind—ExCyte, the world’s leading flow cytometry training company, which organization boasts a veritable library of in-the-lab resources on sequencing, microscopy, and related topics in the life sciences.

Tim Bushnell, PhD

Similar Articles

Common Numbers-Based Questions I Get As A Flow Cytometry Core Manager And How To Answer Them

Common Numbers-Based Questions I Get As A Flow Cytometry Core Manager And How To Answer Them

By: Tim Bushnell, PhD

Numbers are all around us.  My personal favorite is ≅1.618 aka ɸ aka ‘the golden ratio’.  It’s found throughout history, where it has influenced architects and artists. We see it in nature, in plants, and it is used in movies to frame shots. It can be approximated by the Fibonacci sequence (another math favorite of mine). However, I have not worked out how to apply this to flow cytometry.  That doesn’t mean numbers aren’t important in flow cytometry. They are central to everything we do, and in this blog, I’m going to flit around numbers-based questions that I have received…

How To Do Variant Calling From RNASeq NGS Data

How To Do Variant Calling From RNASeq NGS Data

By: Deepak Kumar, PhD

Developing variant calling and analysis pipelines for NGS sequenced data have become a norm in clinical labs. These pipelines include a strategic integration of several tools and techniques to identify molecular and structural variants. That eventually helps in the apt variant annotation and interpretation. This blog will delve into the concepts and intricacies of developing a “variant calling” pipeline using GATK. “Variant calling” can also be performed using tools other than GATK, such as FREEBAYES and SAMTOOLS.  In this blog, I will walk you through variant calling methods on Illumina germline RNASeq data. In the steps, wherever required, I will…

Understanding Clinical Trials And Drug Development As A Research Scientist

Understanding Clinical Trials And Drug Development As A Research Scientist

By: Deepak Kumar, PhD

Clinical trials are studies designed to test the novel methods of diagnosing and treating health conditions – by observing the outcomes of human subjects under experimental conditions.  These are interventional studies that are performed under stringent clinical laboratory settings. Contrariwise, non-interventional studies are performed outside the clinical trial settings that provide researchers an opportunity to monitor the effect of drugs in real-life situations. Non-interventional trials are also termed observational studies as they include post-marketing surveillance studies (PMS) and post-authorization safety studies (PASS). Clinical trials are preferred for testing newly developed drugs since interventional studies are conducted in a highly monitored…

How To Profile DNA And RNA Expression Using Next Generation Sequencing (Part-2)

How To Profile DNA And RNA Expression Using Next Generation Sequencing (Part-2)

By: Deepak Kumar, PhD

In the first blog of this series, we explored the power of sequencing the genome at various levels. We also dealt with how the characterization of the RNA expression levels helps us to understand the changes at the genome level. These changes impact the downstream expression of the target genes. In this blog, we will explore how NGS sequencing can help us comprehend DNA modification that affect the expression pattern of the given genes (epigenetic profiling) as well as characterizing the DNA-protein interactions that allow for the identification of genes that may be regulated by a given protein.  DNA Methylation Profiling…

How To Profile DNA And RNA Expression Using Next Generation Sequencing

How To Profile DNA And RNA Expression Using Next Generation Sequencing

By: Deepak Kumar, PhD

Why is Next Generation Sequencing so powerful to explore and answer both clinical and research questions. With the ability to sequence whole genomes, identifying novel changes between individuals, to exploring what RNA sequences are being expressed, or to examine DNA modifications and protein-DNA interactions occurring that can help researchers better understand the complex regulation of transcription. This, in turn, allows them to characterize changes during different disease states, which can suggest a way to treat said disease.  Over the next two blogs, I will highlight these different methods along with illustrating how these can help clinical diagnostics as well as…

What Is Next Generation Sequencing (NGS) And How Is It Used In Drug Development

What Is Next Generation Sequencing (NGS) And How Is It Used In Drug Development

By: Deepak Kumar, PhD

NGS methodologies have been used to produce high-throughput sequence data. These data with appropriate computational analyses facilitate variant identification and prove to be extremely valuable in pharmaceutical industries and clinical practice for developing drug molecules inhibiting disease progression. Thus, by providing a comprehensive profile of an individual’s variome — particularly that of clinical relevance consisting of pathogenic variants — NGS helps in determining new disease genes. The information thus obtained on genetic variations and the target disease genes can be used by the Pharma companies to develop drugs impeding these variants and their disease-causing effect. However simple this may allude…

7 Key Image Analysis Terms For New Microscopist

7 Key Image Analysis Terms For New Microscopist

By: Heather Brown-Harding, PhD

As scientists, we need to perform image analysis after we’ve acquired images in the microscope, otherwise, we have just a pretty picture and not data. The vocabulary for image processing and analysis can be a little intimidating to those new to the field. Therefore, in this blog, I’m going to break down 7 terms that are key when post-processing of images. 1. RGB Image Images acquired during microscopy can be grouped into two main categories. Either monochrome (that can be multichannel) or “RGB.” RGB stands for red, green, blue – the primary colors of light. The cameras in our phones…

5 FlowJo Hacks To Boost The Quality Of Your Flow Cytometry Analysis

5 FlowJo Hacks To Boost The Quality Of Your Flow Cytometry Analysis

By: Tim Bushnell, PhD

FlowJo is a powerful tool for performing and analyzing flow cytometry experiments, if you know how to use it to the fullest. This includes understanding embedding and using keywords, the FlowJo compensation wizard, spillover spreading matrix, FlowJo and R, and creating tables in FlowJo. Extending your use of FJ using these hacks will help organize your data, improve analysis and make your exported data easier to understand and explain to others. Take a few moments and explore all you can do with FJ beyond just gating populations.

Statistical Challenges Of Rare Event Measurements In Flow Cytometry

Statistical Challenges Of Rare Event Measurements In Flow Cytometry

By: Tim Bushnell, PhD

It is necessary to sort through hundreds of thousands or millions of cells to find the few events of interest. With such low event numbers, we move away from the comfortable domain of the Gaussian distribution and move into the realm of Poisson statistics. There are 3 points to consider to build confidence in the data that the events being counted are truly events of interest and not random events that just happen to fall into the gates of interest.

Top Industry Career eBooks

Get the Advanced Microscopy eBook

Get the Advanced Microscopy eBook

Heather Brown-Harding, PhD

Learn the best practices and advanced techniques across the diverse fields of microscopy, including instrumentation, experimental setup, image analysis, figure preparation, and more.

Get The Free Modern Flow Cytometry eBook

Get The Free Modern Flow Cytometry eBook

Tim Bushnell, PhD

Learn the best practices of flow cytometry experimentation, data analysis, figure preparation, antibody panel design, instrumentation and more.

Get The Free 4-10 Compensation eBook

Get The Free 4-10 Compensation eBook

Tim Bushnell, PhD

Advanced 4-10 Color Compensation, Learn strategies for designing advanced antibody compensation panels and how to use your compensation matrix to analyze your experimental data.