Flow cytometry data are numbers rich.

Data from experiments can be population measurements (percent of CD4+ cells, for example), or it can be expression level (median fluorescent expression of CD69 on activated T cells).

Many times, researchers are content to show histograms to illustrate their point after a flow experiment. This approach misses the opportunity to take that content rich data and extend the analysis into a statistical analysis.

To properly perform statistical analysis, the first step is to understand the hypothesis. The hypothesis will guide the statistical analysis, identifying the correct test to be performed. There are several things that need to be considered when beginning the statistical analysis of the data.

**1. Design your experiment properly from the start.**

Statistical power answers the question of what is the probability of correctly rejecting the null hypothesis when the null hypothesis falls. There are three factors that influence the power of an experiment: the sample size, the spread of the data and the number of replicates. The power of the experiment is related to the ability of the experiment to avoid statistical errors.

**2. Know the classes of statistical errors and how to avoid them. **

False positives (Type I errors) are when a true null hypothesis is incorrectly rejected. False negatives (Type II errors) are when the test fails to reject a false null hypothesis.

In fact, the power of the experiment is defined as the b which is equal to the True positive/(true positive + false negative)

**3. Use the appropriate statistical test.**

The biological hypothesis and experimental design will determine what is the appropriate test for the data. The distribution of the data is also important to consider. How best to determine the correct test? This table can help you determine which test is most appropriate.

**4. Set the appropriate threshold.**

The a value is the threshold that will be used to determine in the data is statistically significant or not. For historical reasons, this value is usually set at 0.05. This can be interpreted as the chance of finding significance where there is none (i.e. The chance of committing a Type I error).

**5. Avoid the more significant trap.**

Once the a value is set, if the P-value is below that value, the data is statistically significant. The data is not more significant if the P-value is 0.01 and the threshold is 0.05 than if the P-value is 0.04. If there is an expectation, and a desire to decrease the Type I error, the threshold should be set to a more stringent level (0.01 or more).

**6. Avoid multiple pairwise comparisons.**

In the case where the experimental design has Drug X, Drug Y and the combination of Drug X and Y, to be compared to an untreated sample, what is the best test? Pairwise comparisons should not be performed in this case for the following reason. With the a set to 0.05, there is a 5% change of committing a Type one error. With each comparison, the change of committing a Type I error increases, as showing in the chart below.

Number of pairwise comparisons | Changes of a Type I error |

2 | 10% |

3 | 15% |

4 | 19% |

5 | 23% |

At the end of the day, the statistical analysis of your flow cytometry data is a critical step for proving the validity of the hypothesis that was being tested. With careful and considered approach to performing the correct testing, the published data will stand up to the rigors of peer review and help lead to another discovery**.**

### Tim Bushnell

My other passions include grilling, wine tasting, and real food. To be honest, my biggest passion is flow cytometry, which is something that Carol and I share. My personal mission is to make flow cytometry education accessible, relevant, and fun. I’ve had a long history in the field starting all the way back in graduate school.