Quickstart: WASP Mapping Filter#
Learn WASP2’s mapping bias correction in 5 minutes.
Overview#
Goal: Understand and apply the WASP mapping filter to remove reference bias from your alignment data.
Time: ~5 minutes to read, ~30 minutes to run on typical data
Prerequisites:
WASP2 installed (
pip install wasp2)Aligned BAM file (coordinate-sorted)
VCF file with heterozygous variants
The Problem: Reference Mapping Bias#
When reads are aligned to a reference genome, there’s an inherent asymmetry:
Reference: ...ACGT[A]CGTA... (reference allele: A)
Read (ref): ...ACGT[A]CGTA... → Perfect match (0 mismatches)
Read (alt): ...ACGT[G]CGTA... → 1 mismatch penalty
Result: Reads carrying the alternate allele are more likely to:
Fail to map entirely
Map with lower quality scores
Map to incorrect locations
This causes inflated reference allele counts, leading to false positive ASE signals.
The Solution: WASP Remap-and-Filter#
WASP corrects this by testing whether each read would map identically regardless of which allele it carries:
Identify: Find reads overlapping heterozygous SNPs
Swap: Create versions with alleles swapped (ref→alt, alt→ref)
Remap: Align swapped reads with the same aligner
Filter: Keep only reads that map to the same location after swapping
After filtering, the probability of mapping is equal for both alleles:
Quick Workflow#
Step 1: Create Swapped Reads#
Identify reads overlapping heterozygous SNPs and generate allele-swapped versions:
wasp2-map make-reads sample.bam variants.vcf.gz \
--samples SAMPLE1 \
--out-dir wasp_output/
This produces (where sample is your BAM file prefix):
wasp_output/sample_to_remap.bam: Original reads needing remappingwasp_output/sample_keep.bam: Reads not overlapping variants (kept as-is)wasp_output/sample_swapped_alleles_r1.fq: Allele-swapped read 1wasp_output/sample_swapped_alleles_r2.fq: Allele-swapped read 2wasp_output/sample_wasp_data_files.json: Metadata for filter step
Step 2: Remap Swapped Reads#
Critical: Use the same aligner and parameters as your original mapping!
# Example with BWA (replace 'sample' with your BAM file prefix)
bwa mem -M -t 8 genome.fa \
wasp_output/sample_swapped_alleles_r1.fq \
wasp_output/sample_swapped_alleles_r2.fq | \
samtools sort -o wasp_output/sample_remapped.bam -
samtools index wasp_output/sample_remapped.bam
Using different alignment parameters will invalidate the WASP correction.
Step 3: Filter Remapped Reads#
The WASP filter compares original and remapped positions:
wasp2-map filter-remapped \
wasp_output/sample_to_remap.bam \
wasp_output/sample_remapped.bam \
wasp_output/sample_wasp_filtered.bam
Understanding Filter Statistics#
The WASP filter reports three key metrics:
Interpreting Filter Rates#
95-99% kept: Good - typical for most data types
90-95% kept: Acceptable - may indicate difficult regions
<90% kept: Investigate - check data quality or variant calls
Before/After Example#
At a site with mapping bias:
Metric |
Before WASP |
After WASP |
|---|---|---|
Reference reads |
150 |
95 |
Alternate reads |
80 |
85 |
Ref fraction |
0.65 |
0.53 |
The biased site (0.65 ref fraction) is corrected to near-balanced (0.53).
Complete Workflow Script#
#!/bin/bash
set -e
# Input files
BAM="sample.bam"
VCF="variants.vcf.gz"
SAMPLE="SAMPLE1"
GENOME="genome.fa"
OUTDIR="wasp_output"
# Extract BAM prefix (filename without .bam extension)
PREFIX=$(basename $BAM .bam)
mkdir -p $OUTDIR
# Step 1: Create allele-swapped reads
echo "Step 1: Creating swapped reads..."
wasp2-map make-reads $BAM $VCF \
--samples $SAMPLE \
--out-dir $OUTDIR/
# Step 2: Remap with same aligner
echo "Step 2: Remapping swapped reads..."
bwa mem -M -t 8 $GENOME \
$OUTDIR/${PREFIX}_swapped_alleles_r1.fq \
$OUTDIR/${PREFIX}_swapped_alleles_r2.fq | \
samtools sort -o $OUTDIR/${PREFIX}_remapped.bam -
samtools index $OUTDIR/${PREFIX}_remapped.bam
# Step 3: Filter biased reads
echo "Step 3: Filtering biased reads..."
wasp2-map filter-remapped \
$OUTDIR/${PREFIX}_to_remap.bam \
$OUTDIR/${PREFIX}_remapped.bam \
$OUTDIR/${PREFIX}_wasp_filtered.bam
# Step 4: Merge with non-overlapping reads
echo "Step 4: Merging final BAM..."
samtools merge -f $OUTDIR/${PREFIX}_final.bam \
$OUTDIR/${PREFIX}_wasp_filtered.bam \
$OUTDIR/${PREFIX}_keep.bam
samtools index $OUTDIR/${PREFIX}_final.bam
echo "Done! WASP-filtered BAM: $OUTDIR/${PREFIX}_final.bam"
Rust Acceleration#
WASP2 includes a high-performance Rust backend that accelerates the filter step:
Dataset Size |
Python |
Rust |
|---|---|---|
1M reads |
~5 minutes |
~30 seconds |
10M reads |
~50 minutes |
~5 minutes |
100M reads |
~8 hours |
~50 minutes |
The Rust backend is used automatically when available.
Next Steps#
After WASP filtering:
Count alleles on the filtered BAM:
wasp2-count count-variants wasp_filtered.bam variants.vcf
Analyze allelic imbalance:
wasp2-analyze find-imbalance counts.tsv
See Also#
Mapping Module (WASP) - Detailed mapping module documentation
WASP Mapping Bias Correction - Algorithm details and mathematics
10X scRNA-seq Tutorial - Single-cell RNA-seq workflow
Summary#
Concept |
Key Point |
|---|---|
Problem |
Reference bias inflates ref allele counts |
Solution |
WASP remap-and-filter removes biased reads |
Workflow |
make-reads → remap → filter-remapped |
Expected |
90-99% reads pass filter |
Result |
Unbiased allele counts for ASE analysis |