Analysis Module API#
The analysis module provides statistical detection of allelic imbalance using beta-binomial models.
Core Statistical Engine#
as_analysis#
Author: Aaron Ho Python Version: 3.9
- analysis.as_analysis.clamp_rho(rho)[source]#
Clamp dispersion parameter rho to safe range (epsilon, 1-epsilon).
The beta-binomial parameterization uses alpha = mu * (1-rho) / rho, which causes division by zero when rho=0 and produces zero alpha/beta when rho=1. This function prevents these boundary issues.
- analysis.as_analysis.opt_linear(disp_params, ref_counts, n_array)[source]#
Optimize dispersion parameter weighted by N (Function called by optimizer)
- Parameters:
- Return type:
- Returns:
Negative log-likelihood value
- analysis.as_analysis.opt_prob(in_prob, in_rho, k, n, log=True)[source]#
Optimize Probability value that maximizes imbalance likelihood. (Function called by optimizer)
CRITICAL FUNCTION - Used by as_analysis_sc.py and compare_ai.py
- Parameters:
in_prob (
float|ndarray[tuple[Any,...],dtype[float64]]) – Probability parameter (scalar or array)in_rho (
float|ndarray[tuple[Any,...],dtype[float64]]) – Dispersion parameter (scalar or array)k (
int|ndarray[tuple[Any,...],dtype[integer[Any]]]) – Reference allele count(s)n (
int|ndarray[tuple[Any,...],dtype[integer[Any]]]) – Total count(s)log (
bool) – If True, return negative log-likelihood; if False, return pmf
- Return type:
- Returns:
Negative log-likelihood (if log=True) or probability mass (if log=False)
- analysis.as_analysis.opt_phased_new(prob, disp, ref_data, n_data, gt_data)[source]#
Optimize likelihood for phased data (updated version for single-cell analysis).
CRITICAL FUNCTION - Used by as_analysis_sc.py and compare_ai.py
- Parameters:
prob (
float) – Probability parameter to optimizedisp (
float|ndarray[tuple[Any,...],dtype[float64]]) – Dispersion parameter (scalar or array)ref_data (
ndarray[tuple[Any,...],dtype[integer[Any]]]) – Array of reference allele countsn_data (
ndarray[tuple[Any,...],dtype[integer[Any]]]) – Array of total countsgt_data (
ndarray[tuple[Any,...],dtype[integer[Any]]]) – Array of genotype phase information
- Return type:
- Returns:
Negative log-likelihood value
- analysis.as_analysis.opt_unphased_dp(prob, disp, first_ref, first_n, phase_ref, phase_n)[source]#
Optimize likelihood while taking phase into account using dynamic programming.
CRITICAL FUNCTION - Used by as_analysis_sc.py and compare_ai.py
- Parameters:
prob (
float) – Probability parameter to optimizedisp (
float|ndarray[tuple[Any,...],dtype[float64]]) – Dispersion parameter (scalar or array)first_ref (
ndarray[tuple[Any,...],dtype[integer[Any]]]) – Reference count for first position (length 1 array)first_n (
ndarray[tuple[Any,...],dtype[integer[Any]]]) – Total count for first position (length 1 array)phase_ref (
ndarray[tuple[Any,...],dtype[integer[Any]]]) – Array of reference counts for subsequent positionsphase_n (
ndarray[tuple[Any,...],dtype[integer[Any]]]) – Array of total counts for subsequent positions
- Return type:
- Returns:
Negative log-likelihood value
- analysis.as_analysis.parse_opt(df, disp=None, phased=False)[source]#
Optimize necessary data when running model
- Parameters:
- Return type:
- Returns:
Tuple of (alt_ll, mu) - likelihood of alternate model and imbalance proportion
- analysis.as_analysis.single_model(df, region_col, phased=False)[source]#
Find allelic imbalance using normal beta-binomial model
- analysis.as_analysis.linear_model(df, region_col, phased=False)[source]#
Find allelic imbalance using linear allelic imbalance model, weighting imbalance linear with N counts
- analysis.as_analysis.get_imbalance(in_data, min_count=10, pseudocount=1, method='single', phased=False, region_col=None, groupby=None)[source]#
Process input data and method for finding allelic imbalance.
CRITICAL FUNCTION - Main analysis entry point used by run_analysis.py
- Parameters:
in_data (
DataFrame|str|Path) – Dataframe with allele counts or filepath to TSV filemin_count (
int) – minimum allele count for analysispseudocount (
int) – pseudocount to add to allele countsmethod (
Literal['single','linear']) – analysis method (“single” or “linear”)phased (
bool) – whether to use phased genotype informationregion_col (
str|None) – column name to group variants by (e.g., gene, peak)groupby (
str|None) – alternative grouping column (overrides region_col if provided)
- Return type:
DataFrame- Returns:
DataFrame with imbalance statistics per region
as_analysis_sc#
Single-cell allelic imbalance analysis functions.
Provides functions for analyzing allelic imbalance in single-cell data stored in AnnData format with SNP counts in layers.
- analysis.as_analysis_sc.adata_count_qc(adata, z_cutoff=None, gt_error=None)[source]#
- Return type:
AnnData
Group Comparison#
compare_ai#
- analysis.compare_ai.get_imbalance_func(ref_count, n_count, phase_array=None)[source]#
Determine which imbalance function to use based on data characteristics.
- Parameters:
- Return type:
- Returns:
Tuple of (likelihood function, function arguments)
- analysis.compare_ai.opt_combined_imbalance(prob, disp, like_func1, like_func1_args, like_func2, like_func2_args)[source]#
Optimize combined imbalance likelihood for two groups.
- Parameters:
prob (
float) – Probability parameterdisp (
float) – Dispersion parameterlike_func1 (
Callable[...,float]) – Likelihood function for group 1like_func1_args (
tuple[Any,...]) – Arguments for group 1 likelihood functionlike_func2 (
Callable[...,float]) – Likelihood function for group 2like_func2_args (
tuple[Any,...]) – Arguments for group 2 likelihood function
- Return type:
- Returns:
Combined negative log-likelihood
- analysis.compare_ai.get_compared_imbalance(adata, min_count=10, pseudocount=1, phased=False, sample=None, groups=None)[source]#
Compare allelic imbalance between groups using shared SNPs.
- Parameters:
adata (
AnnData) – AnnData object containing SNP count datamin_count (
int) – Minimum allele count thresholdpseudocount (
int) – Pseudocount to add to avoid zero countsphased (
bool) – Whether to use phased analysissample (
str|None) – Sample column name for phasing informationgroups (
list[str] |None) – List of groups to compare (if None, compare all)
- Return type:
- Returns:
Dict mapping (group1, group2) tuples to comparison DataFrames
- analysis.compare_ai.compare_imbalance_between_groups(disp, ref_counts1, n_counts1, ref_counts2, n_counts2, region_snp_dict, gt_array=None)[source]#
Compare allelic imbalance between two groups for shared regions.
- Parameters:
disp (
float) – Dispersion parameterref_counts1 (
ndarray[tuple[Any,...],dtype[uint16]]) – Reference allele counts for group 1n_counts1 (
ndarray[tuple[Any,...],dtype[uint16]]) – Total counts for group 1ref_counts2 (
ndarray[tuple[Any,...],dtype[uint16]]) – Reference allele counts for group 2n_counts2 (
ndarray[tuple[Any,...],dtype[uint16]]) – Total counts for group 2region_snp_dict (
dict[str,tuple[int,...]]) – Dict mapping region names to SNP index tuplesgt_array (
ndarray[tuple[Any,...],dtype[uint8]] |None) – Optional genotype/phasing array
- Return type:
DataFrame- Returns:
DataFrame with comparison statistics and p-values
- analysis.compare_ai.get_compared_imbalance_diff_snps(adata, min_count=10, pseudocount=1, phased=False, sample=None, groups=None)[source]#
Compare allelic imbalance between groups (V0 version without shared SNPs).
- Parameters:
adata (
AnnData) – AnnData object containing SNP count datamin_count (
int) – Minimum allele count thresholdpseudocount (
int) – Pseudocount to add to avoid zero countsphased (
bool) – Whether to use phased analysissample (
str|None) – Sample column name for phasing informationgroups (
list[str] |None) – List of groups to compare (if None, compare all)
- Return type:
- Returns:
Dict mapping (group1, group2) tuples to comparison DataFrames
- analysis.compare_ai.compare_imbalance_between_groups_diff_snps(disp, ref_counts1, n_counts1, phase_array1, region_snp_dict1, ref_counts2, n_counts2, phase_array2, region_snp_dict2)[source]#
Compare allelic imbalance between two groups with different SNPs per region.
- Parameters:
disp (
float) – Dispersion parameterref_counts1 (
ndarray[tuple[Any,...],dtype[uint16]]) – Reference allele counts for group 1n_counts1 (
ndarray[tuple[Any,...],dtype[uint16]]) – Total counts for group 1phase_array1 (
ndarray[tuple[Any,...],dtype[uint8]] |None) – Optional phasing array for group 1region_snp_dict1 (
dict[str,tuple[int,...]]) – Dict mapping region names to SNP index tuples for group 1ref_counts2 (
ndarray[tuple[Any,...],dtype[uint16]]) – Reference allele counts for group 2n_counts2 (
ndarray[tuple[Any,...],dtype[uint16]]) – Total counts for group 2phase_array2 (
ndarray[tuple[Any,...],dtype[uint8]] |None) – Optional phasing array for group 2region_snp_dict2 (
dict[str,tuple[int,...]]) – Dict mapping region names to SNP index tuples for group 2
- Return type:
DataFrame- Returns:
DataFrame with comparison statistics and p-values
Analysis Runners#
run_analysis#
Allelic imbalance analysis pipeline.
Main entry point for running the beta-binomial allelic imbalance analysis using the Rust-accelerated backend.
- class analysis.run_analysis.WaspAnalysisData(count_file, min_count=None, pseudocount=None, phased=None, model=None, out_file=None, region_col=None, groupby=None)[source]#
Bases:
objectContainer for allelic imbalance analysis configuration.
- model#
Dispersion model type.
- Type:
Literal[“single”, “linear”]
- analysis.run_analysis.run_ai_analysis(count_file, min_count=None, pseudocount=None, phased=None, model=None, out_file=None, region_col=None, groupby=None)[source]#
Run allelic imbalance analysis pipeline.
- Parameters:
count_file (str | Path) – Path to TSV file with allele counts.
min_count (int | None, optional) – Minimum total count threshold, by default 10.
pseudocount (int | None, optional) – Pseudocount to add, by default 1.
phased (bool | None, optional) – Use phased genotype information, by default False.
model (str | None, optional) – Dispersion model (‘single’ or ‘linear’), by default ‘single’.
out_file (str | None, optional) – Output file path, by default ‘ai_results.tsv’.
region_col (str | None, optional) – Column name for grouping variants.
groupby (str | None, optional) – Additional grouping column.
- Raises:
RuntimeError – If Rust analysis extension is not available.
- Return type:
run_analysis_sc#
- class analysis.run_analysis_sc.WaspAnalysisSC(adata_file, bc_map, min_count=None, pseudocount=None, phased=None, sample=None, groups=None, model=None, out_file=None, z_cutoff=None)[source]#
Bases:
object
- class analysis.run_analysis_sc.AdataInputs(adata, sample, groups, phased)[source]#
Bases:
NamedTuple- adata: anndata.AnnData#
Alias for field number 0
run_compare_ai#
CLI Entry Point#
- analysis.__main__.main(ctx, version=False, verbose=False, quiet=False)[source]#
WASP2 allelic imbalance analysis commands.
- Return type:
- analysis.__main__.find_imbalance(counts, min=None, pseudocount=None, out_file=None, phased=False, model=None, region_col=None, groupby=None)[source]#
- Return type: