Analysis Module API#

The analysis module provides statistical detection of allelic imbalance using beta-binomial models.

Core Statistical Engine#

as_analysis#

Author: Aaron Ho Python Version: 3.9

analysis.as_analysis.clamp_rho(rho)[source]#

Clamp dispersion parameter rho to safe range (epsilon, 1-epsilon).

The beta-binomial parameterization uses alpha = mu * (1-rho) / rho, which causes division by zero when rho=0 and produces zero alpha/beta when rho=1. This function prevents these boundary issues.

Parameters:

rho (float | ndarray[tuple[Any, ...], dtype[float64]]) – Dispersion parameter (scalar or array), expected in [0, 1]

Return type:

float | ndarray[tuple[Any, ...], dtype[float64]]

Returns:

Clamped rho in range (RHO_EPSILON, 1 - RHO_EPSILON)

analysis.as_analysis.opt_linear(disp_params, ref_counts, n_array)[source]#

Optimize dispersion parameter weighted by N (Function called by optimizer)

Parameters:
Return type:

float

Returns:

Negative log-likelihood value

analysis.as_analysis.opt_prob(in_prob, in_rho, k, n, log=True)[source]#

Optimize Probability value that maximizes imbalance likelihood. (Function called by optimizer)

CRITICAL FUNCTION - Used by as_analysis_sc.py and compare_ai.py

Parameters:
Return type:

float | ndarray[tuple[Any, ...], dtype[float64]]

Returns:

Negative log-likelihood (if log=True) or probability mass (if log=False)

analysis.as_analysis.opt_phased_new(prob, disp, ref_data, n_data, gt_data)[source]#

Optimize likelihood for phased data (updated version for single-cell analysis).

CRITICAL FUNCTION - Used by as_analysis_sc.py and compare_ai.py

Parameters:
Return type:

float

Returns:

Negative log-likelihood value

analysis.as_analysis.opt_unphased_dp(prob, disp, first_ref, first_n, phase_ref, phase_n)[source]#

Optimize likelihood while taking phase into account using dynamic programming.

CRITICAL FUNCTION - Used by as_analysis_sc.py and compare_ai.py

Parameters:
Return type:

float

Returns:

Negative log-likelihood value

analysis.as_analysis.parse_opt(df, disp=None, phased=False)[source]#

Optimize necessary data when running model

Parameters:
  • df (DataFrame) – Dataframe with allele counts

  • disp (float | ndarray[tuple[Any, ...], dtype[float64]] | None) – pre-computed dispersion parameter, defaults to None

  • phased (bool) – Whether data is phased

Return type:

tuple[float, float]

Returns:

Tuple of (alt_ll, mu) - likelihood of alternate model and imbalance proportion

analysis.as_analysis.single_model(df, region_col, phased=False)[source]#

Find allelic imbalance using normal beta-binomial model

Parameters:
  • df (DataFrame) – Dataframe with allele counts

  • region_col (str) – Name of column to group by

  • phased (bool) – Whether data is phased

Return type:

DataFrame

Returns:

Dataframe with imbalance likelihood

analysis.as_analysis.linear_model(df, region_col, phased=False)[source]#

Find allelic imbalance using linear allelic imbalance model, weighting imbalance linear with N counts

Parameters:
  • df (DataFrame) – Dataframe with allele counts

  • region_col (str) – Name of column to group by

  • phased (bool) – Whether data is phased

Return type:

DataFrame

Returns:

Dataframe with imbalance likelihood

analysis.as_analysis.get_imbalance(in_data, min_count=10, pseudocount=1, method='single', phased=False, region_col=None, groupby=None)[source]#

Process input data and method for finding allelic imbalance.

CRITICAL FUNCTION - Main analysis entry point used by run_analysis.py

Parameters:
  • in_data (DataFrame | str | Path) – Dataframe with allele counts or filepath to TSV file

  • min_count (int) – minimum allele count for analysis

  • pseudocount (int) – pseudocount to add to allele counts

  • method (Literal['single', 'linear']) – analysis method (“single” or “linear”)

  • phased (bool) – whether to use phased genotype information

  • region_col (str | None) – column name to group variants by (e.g., gene, peak)

  • groupby (str | None) – alternative grouping column (overrides region_col if provided)

Return type:

DataFrame

Returns:

DataFrame with imbalance statistics per region

as_analysis_sc#

Single-cell allelic imbalance analysis functions.

Provides functions for analyzing allelic imbalance in single-cell data stored in AnnData format with SNP counts in layers.

analysis.as_analysis_sc.adata_count_qc(adata, z_cutoff=None, gt_error=None)[source]#
Return type:

AnnData

analysis.as_analysis_sc.get_imbalance_sc(adata, min_count=10, pseudocount=1, phased=False, sample=None, groups=None)[source]#
Return type:

dict[str, DataFrame]

analysis.as_analysis_sc.get_imbalance_per_group(ref_counts, n_counts, region_snp_dict, disp, gt_array=None)[source]#
Return type:

DataFrame

Group Comparison#

compare_ai#

analysis.compare_ai.get_imbalance_func(ref_count, n_count, phase_array=None)[source]#

Determine which imbalance function to use based on data characteristics.

Parameters:
Return type:

tuple[Callable[..., Any], tuple[Any, ...]]

Returns:

Tuple of (likelihood function, function arguments)

analysis.compare_ai.opt_combined_imbalance(prob, disp, like_func1, like_func1_args, like_func2, like_func2_args)[source]#

Optimize combined imbalance likelihood for two groups.

Parameters:
  • prob (float) – Probability parameter

  • disp (float) – Dispersion parameter

  • like_func1 (Callable[..., float]) – Likelihood function for group 1

  • like_func1_args (tuple[Any, ...]) – Arguments for group 1 likelihood function

  • like_func2 (Callable[..., float]) – Likelihood function for group 2

  • like_func2_args (tuple[Any, ...]) – Arguments for group 2 likelihood function

Return type:

float

Returns:

Combined negative log-likelihood

analysis.compare_ai.get_compared_imbalance(adata, min_count=10, pseudocount=1, phased=False, sample=None, groups=None)[source]#

Compare allelic imbalance between groups using shared SNPs.

Parameters:
  • adata (AnnData) – AnnData object containing SNP count data

  • min_count (int) – Minimum allele count threshold

  • pseudocount (int) – Pseudocount to add to avoid zero counts

  • phased (bool) – Whether to use phased analysis

  • sample (str | None) – Sample column name for phasing information

  • groups (list[str] | None) – List of groups to compare (if None, compare all)

Return type:

dict[tuple[str, str], DataFrame]

Returns:

Dict mapping (group1, group2) tuples to comparison DataFrames

analysis.compare_ai.compare_imbalance_between_groups(disp, ref_counts1, n_counts1, ref_counts2, n_counts2, region_snp_dict, gt_array=None)[source]#

Compare allelic imbalance between two groups for shared regions.

Parameters:
Return type:

DataFrame

Returns:

DataFrame with comparison statistics and p-values

analysis.compare_ai.get_compared_imbalance_diff_snps(adata, min_count=10, pseudocount=1, phased=False, sample=None, groups=None)[source]#

Compare allelic imbalance between groups (V0 version without shared SNPs).

Parameters:
  • adata (AnnData) – AnnData object containing SNP count data

  • min_count (int) – Minimum allele count threshold

  • pseudocount (int) – Pseudocount to add to avoid zero counts

  • phased (bool) – Whether to use phased analysis

  • sample (str | None) – Sample column name for phasing information

  • groups (list[str] | None) – List of groups to compare (if None, compare all)

Return type:

dict[tuple[str, str], DataFrame]

Returns:

Dict mapping (group1, group2) tuples to comparison DataFrames

analysis.compare_ai.compare_imbalance_between_groups_diff_snps(disp, ref_counts1, n_counts1, phase_array1, region_snp_dict1, ref_counts2, n_counts2, phase_array2, region_snp_dict2)[source]#

Compare allelic imbalance between two groups with different SNPs per region.

Parameters:
Return type:

DataFrame

Returns:

DataFrame with comparison statistics and p-values

Analysis Runners#

run_analysis#

Allelic imbalance analysis pipeline.

Main entry point for running the beta-binomial allelic imbalance analysis using the Rust-accelerated backend.

class analysis.run_analysis.WaspAnalysisData(count_file, min_count=None, pseudocount=None, phased=None, model=None, out_file=None, region_col=None, groupby=None)[source]#

Bases: object

Container for allelic imbalance analysis configuration.

count_file#

Path to the count TSV file.

Type:

str | Path

region_col#

Column name for grouping variants by region.

Type:

str | None

groupby#

Column name for additional grouping (e.g., parent gene).

Type:

str | None

out_file#

Output file path for results.

Type:

str

phased#

Whether to use phased genotype information.

Type:

bool

model#

Dispersion model type.

Type:

Literal[“single”, “linear”]

min_count#

Minimum total allele count threshold.

Type:

int

pseudocount#

Pseudocount to add to allele counts.

Type:

int

__init__(count_file, min_count=None, pseudocount=None, phased=None, model=None, out_file=None, region_col=None, groupby=None)[source]#
analysis.run_analysis.run_ai_analysis(count_file, min_count=None, pseudocount=None, phased=None, model=None, out_file=None, region_col=None, groupby=None)[source]#

Run allelic imbalance analysis pipeline.

Parameters:
  • count_file (str | Path) – Path to TSV file with allele counts.

  • min_count (int | None, optional) – Minimum total count threshold, by default 10.

  • pseudocount (int | None, optional) – Pseudocount to add, by default 1.

  • phased (bool | None, optional) – Use phased genotype information, by default False.

  • model (str | None, optional) – Dispersion model (‘single’ or ‘linear’), by default ‘single’.

  • out_file (str | None, optional) – Output file path, by default ‘ai_results.tsv’.

  • region_col (str | None, optional) – Column name for grouping variants.

  • groupby (str | None, optional) – Additional grouping column.

Raises:

RuntimeError – If Rust analysis extension is not available.

Return type:

None

run_analysis_sc#

class analysis.run_analysis_sc.WaspAnalysisSC(adata_file, bc_map, min_count=None, pseudocount=None, phased=None, sample=None, groups=None, model=None, out_file=None, z_cutoff=None)[source]#

Bases: object

__init__(adata_file, bc_map, min_count=None, pseudocount=None, phased=None, sample=None, groups=None, model=None, out_file=None, z_cutoff=None)[source]#
update_data(data)[source]#
Return type:

None

class analysis.run_analysis_sc.AdataInputs(adata, sample, groups, phased)[source]#

Bases: NamedTuple

adata: anndata.AnnData#

Alias for field number 0

sample: str#

Alias for field number 1

groups: list[str]#

Alias for field number 2

phased: bool#

Alias for field number 3

analysis.run_analysis_sc.process_adata_inputs(adata, ai_files=None, bc_map=None, sample=None, groups=None, phased=None)[source]#
Return type:

AdataInputs

analysis.run_analysis_sc.run_ai_analysis_sc(count_file, bc_map, min_count=None, pseudocount=None, phase=None, sample=None, groups=None, out_file=None, z_cutoff=None)[source]#
Return type:

None

run_compare_ai#

analysis.run_compare_ai.run_ai_comparison(count_file, bc_map, min_count=None, pseudocount=None, phase=None, sample=None, groups=None, out_file=None, z_cutoff=None)[source]#
Return type:

None

CLI Entry Point#

analysis.__main__.main(ctx, version=False, verbose=False, quiet=False)[source]#

WASP2 allelic imbalance analysis commands.

Return type:

None

analysis.__main__.find_imbalance(counts, min=None, pseudocount=None, out_file=None, phased=False, model=None, region_col=None, groupby=None)[source]#
Return type:

None

analysis.__main__.find_imbalance_sc(counts, bc_map, min=None, pseudocount=None, sample=None, groups=None, phased=None, out_file=None, z_cutoff=None)[source]#
Return type:

None

analysis.__main__.compare_imbalance(counts, bc_map, min=None, pseudocount=None, sample=None, groups=None, phased=None, out_file=None, z_cutoff=None)[source]#
Return type:

None