Quickstart: Count Alleles in 5 Minutes
======================================

This tutorial demonstrates the basic WASP2 allele counting workflow using a minimal test dataset.

**What you'll learn:**

- How to count allele-specific reads from a BAM file
- Basic WASP2 command-line usage
- Understanding the output format

**Prerequisites:**

- WASP2 installed (``pip install wasp2``)
- Basic familiarity with BAM and VCF file formats

Setup
-----

First, verify WASP2 is installed:

.. code-block:: bash

   wasp2-count --version

Test Data
---------

We'll use the minimal test data included in the WASP2 repository:

- **BAM file**: Synthetic paired-end reads overlapping heterozygous variants
- **VCF file**: 6 variants with genotypes for two samples
- **GTF file**: Gene annotations for 3 genes

The test data is located in ``pipelines/nf-modules/tests/data/``.

**VCF contents:**

.. code-block:: text

   #CHROM  POS  ID   REF  ALT  QUAL  FILTER  INFO    FORMAT  sample1  sample2
   chr1    100  rs1  A    G    30    PASS    DP=50   GT      0/1      0/0
   chr1    200  rs2  C    T    30    PASS    DP=45   GT      1/1      0/1
   chr1    300  rs3  G    A    30    PASS    DP=60   GT      0/0      1/1
   chr1    400  rs4  T    C    30    PASS    DP=55   GT      0/1      0/1
   chr2    100  rs5  A    T    30    PASS    DP=40   GT      0/1      0/0
   chr2    200  rs6  G    C    30    PASS    DP=35   GT      ./.      0/1

The ``GT`` field shows genotypes:

- ``0/1``: Heterozygous (has both reference and alternate alleles)
- ``0/0``: Homozygous reference
- ``1/1``: Homozygous alternate

For allele-specific analysis, we focus on **heterozygous sites** (0/1).

Step 1: Basic Allele Counting
-----------------------------

The simplest way to count alleles is to provide a BAM file and VCF file:

.. code-block:: bash

   wasp2-count count-variants \
     pipelines/nf-modules/tests/data/minimal.bam \
     pipelines/nf-modules/tests/data/sample.vcf.gz \
     --out_file counts_basic.tsv

**Output:**

.. code-block:: text

   chr   pos  ref  alt  ref_count  alt_count  other_count
   chr1  100  A    G    1          0          0
   chr1  400  T    C    1          0          0
   chr2  100  A    T    1          0          0

Output Columns
~~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1
   :widths: 20 80

   * - Column
     - Description
   * - ``chr``
     - Chromosome
   * - ``pos``
     - Variant position (1-based)
   * - ``ref``
     - Reference allele
   * - ``alt``
     - Alternate allele
   * - ``ref_count``
     - Reads supporting reference allele
   * - ``alt_count``
     - Reads supporting alternate allele
   * - ``other_count``
     - Reads with other alleles (errors, indels)

Step 2: Filter by Sample
------------------------

When your VCF contains multiple samples, use ``--samples`` to filter for heterozygous sites in a specific sample:

.. code-block:: bash

   wasp2-count count-variants \
     pipelines/nf-modules/tests/data/minimal.bam \
     pipelines/nf-modules/tests/data/sample.vcf.gz \
     --samples sample1 \
     --out_file counts_sample1.tsv

This returns only the 3 sites where sample1 is heterozygous:

- chr1:100 (rs1)
- chr1:400 (rs4)
- chr2:100 (rs5)

Step 3: Annotate with Gene Regions
----------------------------------

Use ``--region`` to annotate variants with overlapping genomic features (genes, peaks, etc.):

.. code-block:: bash

   wasp2-count count-variants \
     pipelines/nf-modules/tests/data/minimal.bam \
     pipelines/nf-modules/tests/data/sample.vcf.gz \
     --samples sample1 \
     --region pipelines/nf-modules/tests/data/sample.gtf \
     --out_file counts_annotated.tsv

The output now includes gene annotations from the GTF file, allowing you to aggregate counts per gene for downstream analysis.

Next Steps
----------

Now that you have allele counts, you can:

1. **Analyze allelic imbalance** using ``wasp2-analyze find-imbalance``
2. **Compare between conditions** using ``wasp2-analyze compare-imbalance``
3. **Correct mapping bias** using ``wasp2-map`` (for WASP-filtered BAMs)

See Also
--------

* :doc:`/user_guide/counting` - Detailed counting options
* :doc:`/tutorials/scrna_seq` - Single-cell RNA-seq tutorial
* :doc:`/tutorials/comparative_imbalance` - Differential imbalance analysis