Mapping Module (WASP)#
Overview#
The WASP (Weighted Allele-Specific Mapping) algorithm corrects reference bias by remapping reads with all possible alleles.
What is Reference Bias?#
Reference bias occurs when reads containing alternate alleles align worse than reads with reference alleles, leading to false allelic imbalance signals.
WASP Solution#
Identify reads overlapping heterozygous SNPs
Generate alternative reads (swap alleles)
Remap both original and swapped reads
Keep only reads that map to the same location
Purpose#
Correct reference bias in RNA-seq, ATAC-seq
Improve accuracy of allelic imbalance detection
Required before allele counting
When to Use#
Use WASP when: * Reads will be used for allelic analysis * Reference genome differs from sample genotype * High-confidence bias correction needed
Workflow#
Complete WASP workflow has 3 steps:
Step 1: Find Intersecting SNPs#
Identify reads overlapping heterozygous SNPs:
wasp2-map find-intersecting-snps \
input.bam \
variants.vcf \
--output intersecting.bam
Output: BAM file with reads overlapping SNPs.
Step 2: Generate Remapping Reads#
Create reads with swapped alleles:
wasp2-map make-reads \
intersecting.bam \
variants.vcf \
--samples sample1 \
--output remap_reads.fastq
Output: FASTQ file(s) with alternative allele sequences.
Step 3: Remap and Filter#
User remaps with their aligner (BWA, STAR, etc.):
# Example with BWA
bwa mem -t 8 reference.fa remap_reads.fastq | \
samtools sort -o remapped.bam -
Then filter to consistent mappings:
wasp2-map filt-remapped-reads \
intersecting.bam \
remapped.bam \
--output filtered.bam
Output: BAM file with bias-corrected reads.
CLI Reference#
find-intersecting-snps#
wasp2-map find-intersecting-snps [OPTIONS] BAM VCF
Options:
* --samples: Filter by sample genotype
* --output: Output BAM file
make-reads#
wasp2-map make-reads [OPTIONS] BAM VCF
Options:
* --samples: Sample name(s)
* --output: Output FASTQ prefix
* --paired: Paired-end mode
filt-remapped-reads#
wasp2-map filt-remapped-reads [OPTIONS] ORIGINAL REMAPPED
Options:
* --output: Filtered BAM file
* --keep_read_file: Save kept read IDs
Input Requirements#
Original BAM: Aligned reads from initial mapping
VCF File: Phased heterozygous SNPs (recommended)
Reference Genome: Same as used for original alignment
Output Interpretation#
WASP Filter Rate#
Typical filter rates: * Good: 95-99% reads kept * Acceptable: 90-95% reads kept * Concerning: <90% reads kept (check data quality)
Low filter rate may indicate: * Poor mapping quality * High SNP density * Problematic reference genome
Complete Example#
Full WASP workflow:
# Step 1: Find SNP-overlapping reads
wasp2-map find-intersecting-snps \
original.bam \
phased_variants.vcf \
--samples NA12878 \
--output intersecting.bam
# Step 2: Generate remapping reads
wasp2-map make-reads \
intersecting.bam \
phased_variants.vcf \
--samples NA12878 \
--paired \
--output remap
# Step 3: Remap (user's aligner)
bwa mem -t <threads> reference.fa \
remap_R1.fastq remap_R2.fastq | \
samtools sort -o remapped.bam -
samtools index remapped.bam
# Step 4: Filter
wasp2-map filt-remapped-reads \
intersecting.bam \
remapped.bam \
--output filtered_wasp.bam
# Step 5: Count alleles (use filtered BAM)
wasp2-count count-variants \
filtered_wasp.bam \
phased_variants.vcf \
--samples NA12878
Performance Tips#
Use multi-threading for remapping step
Filter VCF to high-quality SNPs only
Use phased genotypes when available
Common Issues#
Many Reads Filtered#
Check remapping quality (MAPQ scores)
Verify same reference genome used
Consider relaxing mapping parameters
Slow Remapping#
Use multi-threading (
-tflag)Process chromosomes in parallel
Consider downsampling for testing
Next Steps#
Counting Module - Count alleles from WASP-filtered BAM
Analysis Module - Analyze allelic imbalance