Quickstart: Count Alleles in 5 Minutes#

This tutorial demonstrates the basic WASP2 allele counting workflow using a minimal test dataset.

What you’ll learn:

How to count allele-specific reads from a BAM file
Basic WASP2 command-line usage
Understanding the output format

Prerequisites:

WASP2 installed (pip install wasp2)
Basic familiarity with BAM and VCF file formats

Setup#

First, verify WASP2 is installed:

wasp2-count --version

Test Data#

We’ll use the minimal test data included in the WASP2 repository:

BAM file: Synthetic paired-end reads overlapping heterozygous variants
VCF file: 6 variants with genotypes for two samples
GTF file: Gene annotations for 3 genes

The test data is located in pipelines/nf-modules/tests/data/.

VCF contents:

#CHROM  POS  ID   REF  ALT  QUAL  FILTER  INFO    FORMAT  sample1  sample2
chr1    100  rs1  A    G    30    PASS    DP=50   GT      0/1      0/0
chr1    200  rs2  C    T    30    PASS    DP=45   GT      1/1      0/1
chr1    300  rs3  G    A    30    PASS    DP=60   GT      0/0      1/1
chr1    400  rs4  T    C    30    PASS    DP=55   GT      0/1      0/1
chr2    100  rs5  A    T    30    PASS    DP=40   GT      0/1      0/0
chr2    200  rs6  G    C    30    PASS    DP=35   GT      ./.      0/1

The GT field shows genotypes:

0/1: Heterozygous (has both reference and alternate alleles)
0/0: Homozygous reference
1/1: Homozygous alternate

For allele-specific analysis, we focus on heterozygous sites (0/1).

Step 1: Basic Allele Counting#

The simplest way to count alleles is to provide a BAM file and VCF file:

wasp2-count count-variants \
  pipelines/nf-modules/tests/data/minimal.bam \
  pipelines/nf-modules/tests/data/sample.vcf.gz \
  --out_file counts_basic.tsv

Output:

chr   pos  ref  alt  ref_count  alt_count  other_count
chr1  100  A    G    1          0          0
chr1  400  T    C    1          0          0
chr2  100  A    T    1          0          0

Output Columns#

Column	Description
`chr`	Chromosome
`pos`	Variant position (1-based)
`ref`	Reference allele
`alt`	Alternate allele
`ref_count`	Reads supporting reference allele
`alt_count`	Reads supporting alternate allele
`other_count`	Reads with other alleles (errors, indels)

Step 2: Filter by Sample#

When your VCF contains multiple samples, use --samples to filter for heterozygous sites in a specific sample:

wasp2-count count-variants \
  pipelines/nf-modules/tests/data/minimal.bam \
  pipelines/nf-modules/tests/data/sample.vcf.gz \
  --samples sample1 \
  --out_file counts_sample1.tsv

This returns only the 3 sites where sample1 is heterozygous:

chr1:100 (rs1)
chr1:400 (rs4)
chr2:100 (rs5)

Step 3: Annotate with Gene Regions#

Use --region to annotate variants with overlapping genomic features (genes, peaks, etc.):

wasp2-count count-variants \
  pipelines/nf-modules/tests/data/minimal.bam \
  pipelines/nf-modules/tests/data/sample.vcf.gz \
  --samples sample1 \
  --region pipelines/nf-modules/tests/data/sample.gtf \
  --out_file counts_annotated.tsv

The output now includes gene annotations from the GTF file, allowing you to aggregate counts per gene for downstream analysis.

Next Steps#

Now that you have allele counts, you can:

Analyze allelic imbalance using wasp2-analyze find-imbalance
Compare between conditions using wasp2-analyze compare-imbalance
Correct mapping bias using wasp2-map (for WASP-filtered BAMs)