Counting Module#

Overview#

The counting module quantifies allele-specific read counts at heterozygous SNP positions. It’s the first step in allelic imbalance analysis.

Purpose#

  • Count reads supporting reference vs alternate alleles

  • Filter by sample genotype (heterozygous sites)

  • Annotate with genomic regions (genes, peaks)

  • Support single-cell RNA-seq

When to Use#

Use counting when you have: * Aligned reads (BAM file) * Variant calls (VCF file) * Want to quantify allele-specific expression

CLI Usage#

Basic Command#

wasp2-count count-variants BAM_FILE VCF_FILE

Full Options#

wasp2-count count-variants \
  input.bam \
  variants.vcf \
  --samples sample1,sample2 \
  --region genes.gtf \
  --out_file counts.tsv

Input Requirements#

BAM File#

  • Aligned reads (single-end or paired-end)

  • Indexed (.bai file in same directory)

  • Sorted by coordinate

VCF File#

  • Variant calls with genotype information

  • Heterozygous SNPs (GT=0|1 or 1|0)

  • Can include sample-specific genotypes

Optional: Region File#

Annotate SNPs overlapping genes/peaks:

  • GTF/GFF3 format (genes)

  • BED format (peaks, regions)

  • narrowPeak format (ATAC-seq, ChIP-seq)

Parameters#

--samples / -s#

Filter SNPs heterozygous in specified samples:

--samples sample1,sample2,sample3
# or
--samples samples.txt  # one per line

--region / -r#

Annotate SNPs with overlapping regions:

--region genes.gtf      # Gene annotations
--region peaks.bed      # ATAC-seq peaks
--region regions.gff3   # Custom regions

--out_file / -o#

Output file path (default: counts.tsv):

--out_file my_counts.tsv

Output Format#

Tab-separated file with columns:

Basic Columns#

  • chr: Chromosome

  • pos: SNP position (1-based)

  • ref: Reference allele

  • alt: Alternate allele

  • ref_count: Reads supporting reference

  • alt_count: Reads supporting alternate

  • other_count: Reads supporting other alleles

Optional Columns (with –region)#

  • gene_id: Overlapping gene

  • gene_name: Gene symbol

  • feature: Feature type (exon, intron, etc.)

Example Workflow#

1. Basic Counting#

wasp2-count count-variants sample.bam variants.vcf

2. Filter by Sample#

wasp2-count count-variants \
  sample.bam \
  variants.vcf \
  --samples NA12878

3. Annotate with Genes#

wasp2-count count-variants \
  sample.bam \
  variants.vcf \
  --samples NA12878 \
  --region genes.gtf \
  --out_file counts_annotated.tsv

Single-Cell Counting#

For single-cell RNA-seq:

wasp2-count count-variants-sc \
  sc_rnaseq.bam \
  variants.vcf \
  --barcode_map barcodes.tsv

Output includes cell-type-specific counts.

Common Issues#

Low Count Numbers#

  • Check BAM file coverage (samtools depth)

  • Verify VCF contains heterozygous SNPs

  • Ensure BAM and VCF use same reference genome

No Output SNPs#

  • Check if –samples filter is too restrictive

  • Verify VCF has genotype information (GT field)

  • Ensure BAM file is indexed

Next Steps#

After counting: * Analysis Module - Detect allelic imbalance * Mapping Module (WASP) - Correct reference bias with WASP