The Comparison class for statistics
Source:vignettes/stat_the-comparison-class-for-statistics.Rmd
stat_the-comparison-class-for-statistics.Rmd
Introduction
The Comparison
class is a fundamental component of our
package, designed to facilitate the analysis of grouped data across
various domains, including bioinformatics, finance, and general
statistical applications. This vignette will provide a comprehensive
overview of the Comparison
class, its purpose, and how to
effectively use it in your analyses.
What is the Comparison Class?
The Comparison
class is an R6 class that represents a
comparison between two groups of samples. It encapsulates the essential
information needed to perform comparative analyses, including:
- The name of the comparison
- The order of groups being compared
- A table containing sample information for each group
This structure allows for clear organization of experimental designs and streamlines subsequent analyses.
Why is the Comparison Class Important?
The Comparison
class serves several crucial
purposes:
- Experimental Design: It provides a structured way to define the groups in your experiment, which is crucial for many statistical tests (e.g., t-tests, ANOVA).
- Hypothesis Testing: By explicitly defining the groups and their order, it clarifies the null and alternative hypotheses in your statistical tests.
- Reproducibility: Encapsulating all relevant information about a comparison enhances the reproducibility of statistical analyses.
- Flexibility: It can be used in various statistical contexts, from simple two-group comparisons to more complex experimental designs.
- Integration: The class is designed to work seamlessly with other statistical components of the package, facilitating complex analyses.
In essence the Comparison
class is a formal
representation of the experimental design, providing a structured way to
define and work with comparisons in statistical analyses.
Creating a Comparison Object for Statistical Analysis
To create a Comparison
object, you need to provide three
key pieces of information:
-
comparison_name
: A string that describes the comparison. -
group_order
: A vector of two group names, specifying the order of comparison. -
comparison_table
: A data.table containing the sample information.
Here’s a basic example:
box::use(dt = data.table)
box::use(dmplot[ Comparison ])
comparison <- Comparison$new(
comparison_name = "Treatment over Control",
group_order = c("Control", "Treatment"),
comparison_table = dt$data.table(
group = c("Control", "Control", "Treatment", "Treatment"),
sample = c("Sample1", "Sample2", "Sample3", "Sample4")
)
)
I want you to pay close attention the language used above when describing a comparison. We always use the word “over” instead of “vs” to describe the comparison. This is because “over” is more descriptive and explicit in conveying which group is the control or baseline group; mathematically speaking the control is the denominator in the comparison.
In this example, we’re setting up a comparison between a “Treatment” group and a “Control” group, with two samples in each group.
Once created, a Comparison
object contains several
important pieces of information:
print(comparison)
#> Comparison R6 object
#> -----------------
#> Comparison Name: Treatment over Control
#> Group Order: Control, Treatment
#> Comparison Table:
#> group sample condition
#> 1 Control Sample1 control
#> 2 Control Sample2 control
#> 3 Treatment Sample3 test
#> 4 Treatment Sample4 test
Note that the Comparison
class automatically adds a
condition
column to the comparison table, designating the
first group in group_order
as “control” and the second as
“test”. This is crucial for many statistical tests where one group is
considered the baseline or control condition.
Best Practices for Statistical Applications
-
Clear Hypotheses: Use the
comparison_name
to clearly state your statistical hypothesis. -
Group Order: In two-group comparisons, typically
set the control or baseline group as the first in
group_order
. - Sample Size: Ensure balanced sample sizes when possible for robust statistical comparisons.
-
Data Assumptions: Remember that the
Comparison
class doesn’t check for the assumptions of your statistical tests (e.g., normality for t-tests). Always verify these separately. - Multiple Testing: When performing multiple comparisons, consider corrections for multiple testing (e.g., Bonferroni correction).
Conclusion
The Comparison
class provides a robust and flexible
framework for defining and working with statistical comparisons across
various domains. By encapsulating the key elements of a comparison - its
name, group order, and sample information - it simplifies the process of
conducting statistical analyses and promotes reproducible research
practices.