Counting Module#
Overview#
The counting module quantifies allele-specific read counts at heterozygous SNP positions. It’s the first step in allelic imbalance analysis.
Purpose#
Count reads supporting reference vs alternate alleles
Filter by sample genotype (heterozygous sites)
Annotate with genomic regions (genes, peaks)
Support single-cell RNA-seq
When to Use#
Use counting when you have: * Aligned reads (BAM file) * Variant calls (VCF file) * Want to quantify allele-specific expression
CLI Usage#
Basic Command#
wasp2-count count-variants BAM_FILE VCF_FILE
Full Options#
wasp2-count count-variants \
input.bam \
variants.vcf \
--samples sample1,sample2 \
--region genes.gtf \
--out_file counts.tsv
Input Requirements#
BAM File#
Aligned reads (single-end or paired-end)
Indexed (.bai file in same directory)
Sorted by coordinate
VCF File#
Variant calls with genotype information
Heterozygous SNPs (GT=0|1 or 1|0)
Can include sample-specific genotypes
Optional: Region File#
Annotate SNPs overlapping genes/peaks:
GTF/GFF3 format (genes)
BED format (peaks, regions)
narrowPeak format (ATAC-seq, ChIP-seq)
Parameters#
--samples / -s#
Filter SNPs heterozygous in specified samples:
--samples sample1,sample2,sample3
# or
--samples samples.txt # one per line
--region / -r#
Annotate SNPs with overlapping regions:
--region genes.gtf # Gene annotations
--region peaks.bed # ATAC-seq peaks
--region regions.gff3 # Custom regions
--out_file / -o#
Output file path (default: counts.tsv):
--out_file my_counts.tsv
Output Format#
Tab-separated file with columns:
Basic Columns#
chr: Chromosomepos: SNP position (1-based)ref: Reference allelealt: Alternate alleleref_count: Reads supporting referencealt_count: Reads supporting alternateother_count: Reads supporting other alleles
Optional Columns (with –region)#
gene_id: Overlapping genegene_name: Gene symbolfeature: Feature type (exon, intron, etc.)
Example Workflow#
1. Basic Counting#
wasp2-count count-variants sample.bam variants.vcf
2. Filter by Sample#
wasp2-count count-variants \
sample.bam \
variants.vcf \
--samples NA12878
3. Annotate with Genes#
wasp2-count count-variants \
sample.bam \
variants.vcf \
--samples NA12878 \
--region genes.gtf \
--out_file counts_annotated.tsv
Single-Cell Counting#
For single-cell RNA-seq:
wasp2-count count-variants-sc \
sc_rnaseq.bam \
variants.vcf \
--barcode_map barcodes.tsv
Output includes cell-type-specific counts.
Common Issues#
Low Count Numbers#
Check BAM file coverage (
samtools depth)Verify VCF contains heterozygous SNPs
Ensure BAM and VCF use same reference genome
No Output SNPs#
Check if –samples filter is too restrictive
Verify VCF has genotype information (GT field)
Ensure BAM file is indexed
Next Steps#
After counting: * Analysis Module - Detect allelic imbalance * Mapping Module (WASP) - Correct reference bias with WASP