False Discovery Rate Correction
================================

This document describes the multiple testing correction methods used in WASP2
to control false positive rates when testing many genomic regions.

.. contents:: Contents
   :local:
   :depth: 2

The Multiple Testing Problem
----------------------------

When testing thousands of genomic regions for allelic imbalance, even a small
per-test false positive rate leads to many false discoveries:

.. math::

   E[\text{false positives}] = m \cdot \alpha

For :math:`m = 10{,}000` tests at :math:`\alpha = 0.05`, we expect 500 false
positives by chance alone.

**Example**: Testing 20,000 genes for ASE

- At α = 0.05: ~1,000 expected false positives
- At α = 0.01: ~200 expected false positives
- Even stringent thresholds yield many false discoveries

Error Rate Definitions
----------------------

Family-Wise Error Rate (FWER)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The probability of making **any** false discovery:

.. math::

   \text{FWER} = P(V \geq 1)

where :math:`V` is the number of false positives.

FWER control (e.g., Bonferroni) is very conservative for genomic studies.

False Discovery Rate (FDR)
^^^^^^^^^^^^^^^^^^^^^^^^^^

The expected **proportion** of false discoveries among rejections:

.. math::

   \text{FDR} = E\left[\frac{V}{R}\right]

where :math:`V` is false positives and :math:`R` is total rejections.

FDR is more appropriate for discovery-oriented genomic studies where:

- Some false positives are acceptable
- The goal is to prioritize candidates for follow-up
- The number of tests is very large

Benjamini-Hochberg Procedure
----------------------------

WASP2 uses the Benjamini-Hochberg (BH) procedure [Benjamini1995]_ for FDR control.

Algorithm
^^^^^^^^^

Given :math:`m` p-values :math:`p_1, p_2, \ldots, p_m`:

1. Sort p-values: :math:`p_{(1)} \leq p_{(2)} \leq \cdots \leq p_{(m)}`
2. Find the largest :math:`k` such that :math:`p_{(k)} \leq \frac{k}{m} \cdot q`
3. Reject all hypotheses with :math:`p_{(i)} \leq p_{(k)}`

where :math:`q` is the target FDR level (typically 0.05 or 0.1).

**Adjusted P-Values (q-values)**:

WASP2 reports BH-adjusted p-values:

.. math::

   q_i = \min_{j \geq i} \left\{ \frac{m \cdot p_{(j)}}{j} \right\}

These can be interpreted as the minimum FDR at which the hypothesis would be
rejected.

Implementation
^^^^^^^^^^^^^^

.. code-block:: python

   from scipy.stats import false_discovery_control

   # WASP2 uses the BH method
   fdr_pvals = false_discovery_control(pvals, method='bh')

   # Equivalent manual implementation
   def benjamini_hochberg(pvals):
       n = len(pvals)
       ranked = np.argsort(pvals)
       adjusted = np.empty(n)
       cummin = 1.0
       for i in range(n - 1, -1, -1):
           idx = ranked[i]
           adjusted[idx] = min(cummin, pvals[idx] * n / (i + 1))
           cummin = adjusted[idx]
       return adjusted

Properties
^^^^^^^^^^

**Advantages**:

- Controls FDR at level :math:`q` under independence
- More powerful than FWER methods (fewer false negatives)
- Simple to compute and interpret
- Works well with continuous p-values

**Assumptions**:

- P-values under the null are uniformly distributed
- Independence or positive regression dependence (PRDS)

The chi-squared p-values from WASP2's likelihood ratio test satisfy these
assumptions when regions are independent.

Alternative Methods
-------------------

While WASP2 uses BH by default, researchers may consider alternatives
for specific scenarios.

Benjamini-Yekutieli (BY)
^^^^^^^^^^^^^^^^^^^^^^^^

For arbitrary dependence between tests:

.. math::

   p_{(k)} \leq \frac{k}{m \cdot c(m)} \cdot q

where :math:`c(m) = \sum_{i=1}^{m} 1/i \approx \ln(m) + 0.577`.

BY is more conservative but valid under any dependence structure.

.. code-block:: python

   # Available in scipy
   fdr_pvals = false_discovery_control(pvals, method='by')

Storey's q-value
^^^^^^^^^^^^^^^^

Estimates the proportion of true nulls (:math:`\pi_0`) for improved power:

.. math::

   \hat{\pi}_0 = \frac{\#\{p_i > \lambda\}}{m(1 - \lambda)}

The q-value procedure is more powerful when many tests are true discoveries.

.. code-block:: python

   # Requires qvalue package
   # pip install qvalue
   from qvalue import qvalue
   q = qvalue(pvals)

Alternative Correction Methods
------------------------------

**Discrete FDR and Mid-P Adjustments**

For exact tests with discrete p-values (Fisher's exact, exact binomial),
specialized methods like Gilbert's procedure [Gilbert2005]_ or mid-p
adjustments can reduce conservativeness.

However, WASP2's likelihood ratio test produces continuous p-values from
the chi-squared distribution, so **standard BH is appropriate** and these
discrete methods are not needed.

Practical Guidelines
--------------------

Choosing an FDR Threshold
^^^^^^^^^^^^^^^^^^^^^^^^^

.. table:: FDR Threshold Guidelines
   :widths: 20 40 40

   ======== ================================= ================================
   FDR      Use Case                          Interpretation
   ======== ================================= ================================
   0.01     High-confidence discoveries       ~1% false among significant
   0.05     Standard exploratory analysis     ~5% false among significant
   0.10     Liberal discovery                 ~10% false, maximize sensitivity
   0.20     Hypothesis generation             For follow-up validation
   ======== ================================= ================================

When to Use Stricter Control
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Consider stricter FDR or FWER control when:

- Results will directly inform clinical decisions
- Follow-up validation is expensive
- The number of true positives is expected to be small
- Independence assumptions may be violated

Reporting Results
^^^^^^^^^^^^^^^^^

When reporting FDR-corrected results:

1. **Report the method**: "FDR correction was performed using the
   Benjamini-Hochberg procedure"
2. **Report the threshold**: "Significance was declared at FDR < 0.05"
3. **Report both p-values**: Include raw and adjusted p-values in supplements
4. **Report the number of tests**: "Among 15,234 tested regions..."

Output Format
-------------

WASP2 reports both raw and adjusted p-values:

.. table:: P-value Columns in Output
   :widths: 20 20 60

   ========= ======== ====================================================
   Column    Type     Description
   ========= ======== ====================================================
   pval      float    Raw p-value from likelihood ratio test
   fdr_pval  float    BH-adjusted p-value (q-value)
   ========= ======== ====================================================

**Interpretation**:

- ``pval < 0.05``: Nominally significant (not corrected)
- ``fdr_pval < 0.05``: Significant after multiple testing correction

References
----------

.. [Benjamini1995] Benjamini Y, Hochberg Y (1995). Controlling the false
   discovery rate: A practical and powerful approach to multiple testing.
   *Journal of the Royal Statistical Society B* 57:289-300.

.. [Storey2003] Storey JD, Tibshirani R (2003). Statistical significance for
   genomewide studies. *Proceedings of the National Academy of Sciences*
   100:9440-9445.

.. [Gilbert2005] Gilbert PB (2005). A modified false discovery rate
   multiple-comparisons procedure for discrete data, applied to human
   immunodeficiency virus genetics. *Journal of the Royal Statistical
   Society C* 54:143-158.