The Comparison class for statistics • dmplot

Introduction

The Comparison class is a fundamental component of our package, designed to facilitate the analysis of grouped data across various domains, including bioinformatics, finance, and general statistical applications. This vignette will provide a comprehensive overview of the Comparison class, its purpose, and how to effectively use it in your analyses.

What is the Comparison Class?

The Comparison class is an R6 class that represents a comparison between two groups of samples. It encapsulates the essential information needed to perform comparative analyses, including:

The name of the comparison
The order of groups being compared
A table containing sample information for each group

This structure allows for clear organization of experimental designs and streamlines subsequent analyses.

Why is the Comparison Class Important?

The Comparison class serves several crucial purposes:

Experimental Design: It provides a structured way to define the groups in your experiment, which is crucial for many statistical tests (e.g., t-tests, ANOVA).
Hypothesis Testing: By explicitly defining the groups and their order, it clarifies the null and alternative hypotheses in your statistical tests.
Reproducibility: Encapsulating all relevant information about a comparison enhances the reproducibility of statistical analyses.
Flexibility: It can be used in various statistical contexts, from simple two-group comparisons to more complex experimental designs.
Integration: The class is designed to work seamlessly with other statistical components of the package, facilitating complex analyses.

In essence the Comparison class is a formal representation of the experimental design, providing a structured way to define and work with comparisons in statistical analyses.

Creating a Comparison Object for Statistical Analysis

To create a Comparison object, you need to provide three key pieces of information:

comparison_name: A string that describes the comparison.
group_order: A vector of two group names, specifying the order of comparison.
comparison_table: A data.table containing the sample information.

Here’s a basic example:

box::use(dt = data.table)
box::use(dmplot[ Comparison ])

comparison <- Comparison$new(
    comparison_name = "Treatment over Control",
    group_order = c("Control", "Treatment"),
    comparison_table = dt$data.table(
        group = c("Control", "Control", "Treatment", "Treatment"),
        sample = c("Sample1", "Sample2", "Sample3", "Sample4")
    )
)

I want you to pay close attention the language used above when describing a comparison. We always use the word “over” instead of “vs” to describe the comparison. This is because “over” is more descriptive and explicit in conveying which group is the control or baseline group; mathematically speaking the control is the denominator in the comparison.

In this example, we’re setting up a comparison between a “Treatment” group and a “Control” group, with two samples in each group.

Once created, a Comparison object contains several important pieces of information:

print(comparison)
#> Comparison R6 object
#> -----------------
#> Comparison Name:  Treatment over Control 
#> Group Order:  Control, Treatment 
#> Comparison Table:
#>       group  sample condition
#> 1   Control Sample1   control
#> 2   Control Sample2   control
#> 3 Treatment Sample3      test
#> 4 Treatment Sample4      test

Note that the Comparison class automatically adds a condition column to the comparison table, designating the first group in group_order as “control” and the second as “test”. This is crucial for many statistical tests where one group is considered the baseline or control condition.

Best Practices for Statistical Applications

Clear Hypotheses: Use the comparison_name to clearly state your statistical hypothesis.
Group Order: In two-group comparisons, typically set the control or baseline group as the first in group_order.
Sample Size: Ensure balanced sample sizes when possible for robust statistical comparisons.
Data Assumptions: Remember that the Comparison class doesn’t check for the assumptions of your statistical tests (e.g., normality for t-tests). Always verify these separately.
Multiple Testing: When performing multiple comparisons, consider corrections for multiple testing (e.g., Bonferroni correction).

Conclusion

The Comparison class provides a robust and flexible framework for defining and working with statistical comparisons across various domains. By encapsulating the key elements of a comparison - its name, group order, and sample information - it simplifies the process of conducting statistical analyses and promotes reproducible research practices.