Quickstart: Count Alleles in 5 Minutes#
This tutorial demonstrates the basic WASP2 allele counting workflow using a minimal test dataset.
What you’ll learn:
How to count allele-specific reads from a BAM file
Basic WASP2 command-line usage
Understanding the output format
Prerequisites:
WASP2 installed (
pip install wasp2)Basic familiarity with BAM and VCF file formats
Setup#
First, verify WASP2 is installed:
wasp2-count --version
Test Data#
We’ll use the minimal test data included in the WASP2 repository:
BAM file: Synthetic paired-end reads overlapping heterozygous variants
VCF file: 6 variants with genotypes for two samples
GTF file: Gene annotations for 3 genes
The test data is located in pipelines/nf-modules/tests/data/.
VCF contents:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample1 sample2
chr1 100 rs1 A G 30 PASS DP=50 GT 0/1 0/0
chr1 200 rs2 C T 30 PASS DP=45 GT 1/1 0/1
chr1 300 rs3 G A 30 PASS DP=60 GT 0/0 1/1
chr1 400 rs4 T C 30 PASS DP=55 GT 0/1 0/1
chr2 100 rs5 A T 30 PASS DP=40 GT 0/1 0/0
chr2 200 rs6 G C 30 PASS DP=35 GT ./. 0/1
The GT field shows genotypes:
0/1: Heterozygous (has both reference and alternate alleles)0/0: Homozygous reference1/1: Homozygous alternate
For allele-specific analysis, we focus on heterozygous sites (0/1).
Step 1: Basic Allele Counting#
The simplest way to count alleles is to provide a BAM file and VCF file:
wasp2-count count-variants \
pipelines/nf-modules/tests/data/minimal.bam \
pipelines/nf-modules/tests/data/sample.vcf.gz \
--out_file counts_basic.tsv
Output:
chr pos ref alt ref_count alt_count other_count
chr1 100 A G 1 0 0
chr1 400 T C 1 0 0
chr2 100 A T 1 0 0
Output Columns#
Column |
Description |
|---|---|
|
Chromosome |
|
Variant position (1-based) |
|
Reference allele |
|
Alternate allele |
|
Reads supporting reference allele |
|
Reads supporting alternate allele |
|
Reads with other alleles (errors, indels) |
Step 2: Filter by Sample#
When your VCF contains multiple samples, use --samples to filter for heterozygous sites in a specific sample:
wasp2-count count-variants \
pipelines/nf-modules/tests/data/minimal.bam \
pipelines/nf-modules/tests/data/sample.vcf.gz \
--samples sample1 \
--out_file counts_sample1.tsv
This returns only the 3 sites where sample1 is heterozygous:
chr1:100 (rs1)
chr1:400 (rs4)
chr2:100 (rs5)
Step 3: Annotate with Gene Regions#
Use --region to annotate variants with overlapping genomic features (genes, peaks, etc.):
wasp2-count count-variants \
pipelines/nf-modules/tests/data/minimal.bam \
pipelines/nf-modules/tests/data/sample.vcf.gz \
--samples sample1 \
--region pipelines/nf-modules/tests/data/sample.gtf \
--out_file counts_annotated.tsv
The output now includes gene annotations from the GTF file, allowing you to aggregate counts per gene for downstream analysis.
Next Steps#
Now that you have allele counts, you can:
Analyze allelic imbalance using
wasp2-analyze find-imbalanceCompare between conditions using
wasp2-analyze compare-imbalanceCorrect mapping bias using
wasp2-map(for WASP-filtered BAMs)
See Also#
Counting Module - Detailed counting options
10X scRNA-seq Tutorial - Single-cell RNA-seq tutorial
Comparative Imbalance Analysis Tutorial - Differential imbalance analysis