Dispersion Parameter Estimation
================================

This document describes how WASP2 estimates the overdispersion parameter
:math:`\rho` in the beta-binomial model.

.. contents:: Contents
   :local:
   :depth: 2

The Dispersion Parameter
------------------------

The dispersion parameter :math:`\rho` quantifies the excess variance beyond
the binomial expectation. In the beta-binomial model:

.. math::

   \text{Var}(X) = N\mu(1-\mu)[1 + (N-1)\rho]

where:

- :math:`\rho = 0`: Binomial variance (no overdispersion)
- :math:`\rho > 0`: Variance inflation due to correlated sampling

Estimation Methods
------------------

WASP2 supports two approaches for dispersion estimation:

1. **Maximum Likelihood Estimation (MLE)**: Optimize the likelihood function
2. **Method of Moments (MoM)**: Solve equations based on sample moments

Both methods have trade-offs in terms of efficiency, bias, and computational cost.

Maximum Likelihood Estimation
-----------------------------

MLE finds the value of :math:`\rho` that maximizes the likelihood of the
observed data.

Single Dispersion Model
^^^^^^^^^^^^^^^^^^^^^^^

WASP2's default approach estimates a single :math:`\rho` across all observations:

.. math::

   \hat{\rho}_{\text{MLE}} = \arg\max_{\rho} \sum_{i=1}^{n} \log P(X_i | N_i, \mu=0.5, \rho)

Under the null hypothesis of no imbalance (:math:`\mu = 0.5`), the log-likelihood is:

.. math::

   \ell(\rho) = \sum_{i=1}^{n} \log \text{BetaBinom}(X_i; N_i, \alpha(\rho), \beta(\rho))

where :math:`\alpha = \beta = 0.5(1-\rho)/\rho`.

**Implementation**:

.. code-block:: python

   from scipy.optimize import minimize_scalar
   from scipy.stats import betabinom
   import numpy as np

   def neg_log_likelihood(rho, ref_counts, n_counts):
       alpha = 0.5 * (1 - rho) / rho
       beta = 0.5 * (1 - rho) / rho
       return -np.sum(betabinom.logpmf(ref_counts, n_counts, alpha, beta))

   result = minimize_scalar(
       neg_log_likelihood,
       args=(ref_array, n_array),
       method='bounded',
       bounds=(1e-6, 1 - 1e-6)
   )
   rho_mle = result.x

**Properties of MLE**:

- Asymptotically unbiased as :math:`n \to \infty`
- Achieves the Cramér-Rao lower bound (efficient)
- Computationally requires numerical optimization
- May be sensitive to outliers

Linear Dispersion Model
^^^^^^^^^^^^^^^^^^^^^^^

For large datasets with variable coverage, WASP2 offers a model where
dispersion varies linearly with total count on the logit scale:

.. math::

   \text{logit}(\rho_i) = \beta_0 + \beta_1 \cdot N_i

The logit link ensures :math:`\rho_i \in (0, 1)`:

.. math::

   \rho_i = \frac{\exp(\beta_0 + \beta_1 N_i)}{1 + \exp(\beta_0 + \beta_1 N_i)}

**Motivation**:

Empirically, regions with different coverage levels may exhibit different
dispersion characteristics:

- **Low coverage**: Greater sampling noise, potentially higher apparent dispersion
- **High coverage**: More stable estimates, may reveal true biological variance

**Implementation**:

.. code-block:: python

   from scipy.optimize import minimize
   from scipy.special import expit

   def neg_ll_linear(params, ref_counts, n_counts):
       beta0, beta1 = params
       logit_rho = beta0 + beta1 * n_counts
       # Clip to avoid numerical issues
       logit_rho = np.clip(logit_rho, -10, 10)
       rho = expit(logit_rho)
       alpha = 0.5 * (1 - rho) / rho
       beta = 0.5 * (1 - rho) / rho
       return -np.sum(betabinom.logpmf(ref_counts, n_counts, alpha, beta))

   result = minimize(
       neg_ll_linear,
       x0=(0, 0),
       method='Nelder-Mead',
       args=(ref_array, n_array)
   )
   beta0, beta1 = result.x

Method of Moments
-----------------

MoM estimates :math:`\rho` by equating theoretical and sample moments.

Variance-Based Estimator
^^^^^^^^^^^^^^^^^^^^^^^^

For a beta-binomial with :math:`\mu = 0.5`, the variance is:

.. math::

   \text{Var}(X) = \frac{N}{4}[1 + (N-1)\rho]

Solving for :math:`\rho`:

.. math::

   \hat{\rho}_{\text{MoM}} = \frac{4S^2/N - 1}{N - 1}

where :math:`S^2` is the sample variance of :math:`X/N` (the allelic ratio).

**Pooled Estimator**:

For observations with varying :math:`N`:

.. math::

   \hat{\rho}_{\text{MoM}} = \frac{\sum_i (X_i - N_i/2)^2 - \sum_i N_i/4}{\sum_i N_i(N_i-1)/4}

**Properties of MoM**:

- Closed-form solution (fast computation)
- May produce negative estimates (truncate to 0)
- Less efficient than MLE, especially for small samples
- More robust to model misspecification

Comparison: MLE vs MoM
----------------------

.. table:: MLE vs MoM for Dispersion Estimation
   :widths: 25 37 38

   =================== ============================== ==============================
   Property            MLE                            MoM
   =================== ============================== ==============================
   Computation         Iterative optimization         Closed-form
   Efficiency          Optimal (achieves CRLB)        Suboptimal
   Bias                Asymptotically unbiased        May be biased for small n
   Robustness          Sensitive to outliers          More robust
   Boundary behavior   Always in (0,1)                May give ρ < 0
   WASP2 default       Yes                            No
   =================== ============================== ==============================

WASP2 uses MLE because:

1. The beta-binomial likelihood is well-behaved
2. Modern optimization is fast enough for typical datasets
3. MLE provides consistent estimates across sample sizes

Practical Considerations
------------------------

Convergence Issues
^^^^^^^^^^^^^^^^^^

The MLE optimizer may fail to converge if:

- :math:`\rho` is very close to 0 (nearly binomial data)
- :math:`\rho` is very close to 1 (extreme overdispersion)
- The data contains extreme outliers

WASP2 handles these by:

- Bounding :math:`\rho` away from 0 and 1
- Clipping logit values to avoid overflow
- Using robust optimization methods (bounded, Nelder-Mead)

Sample Size Requirements
^^^^^^^^^^^^^^^^^^^^^^^^

MLE performance depends on sample size:

.. table:: Dispersion Estimate Quality by Sample Size
   :widths: 25 25 50

   ============ ================ =====================================
   n (regions)  CV of estimate   Recommendation
   ============ ================ =====================================
   < 50         > 50%            Use pooled estimate or prior
   50-200       20-50%           MLE reasonable but uncertain
   200-1000     10-20%           MLE reliable
   > 1000       < 10%            MLE highly accurate
   ============ ================ =====================================

Model Selection
^^^^^^^^^^^^^^^

Choosing between single and linear dispersion models:

**Use Single Dispersion When**:

- Dataset is small (< 1000 regions)
- Coverage is relatively uniform
- Quick analysis is needed

**Use Linear Dispersion When**:

- Large dataset (> 10,000 regions)
- Wide range of coverage values
- Systematic coverage-dispersion relationship suspected

Model comparison can be done via AIC/BIC:

.. math::

   \text{AIC} = 2k - 2\ell(\hat{\theta})

where :math:`k` is the number of parameters (1 for single, 2 for linear).

See Also
--------

- :doc:`fdr_correction` - Multiple testing correction after estimation

References
----------

.. [Robinson2010] Robinson MD, Smyth GK (2010). Small-sample estimation of
   negative binomial dispersion, with applications to SAGE data.
   *Biostatistics* 9:321-332.

.. [Yu2013] Yu D, Huber W, Vitek O (2013). Shrinkage estimation of dispersion
   in Negative Binomial models for RNA-seq experiments with small sample size.
   *Bioinformatics* 29:1275-1282.