vignettes/PONG2-basics.Rmd
PONG2-basics.RmdPONG2 enables scalable and accurate KIR genotyping by combining:
It supports hg19 and hg38 assemblies and is particularly useful for studying immune response variation, HLA–KIR interactions, and disease associations in diverse populations.
--fill-missing)
using minimac4--threads
pong2 --help)R version: ≥ 4.0
Required R packages (loaded at runtime):
PONG2 (this package)readrtidyverseparallelSystem tools (must be in PATH):
| Tool | Version | Required When |
|---|---|---|
| PLINK2 | ≥ 2.0 | Always |
| minimac4 | ≥ 4.1.6 |
--fill-missing only |
| bgzip & tabix | HTSlib |
--fill-missing only |
| Eagle2 | ≥ 2.4 | Pre-phasing before --fill-missing
|
# Install remotes if needed
if (!require("remotes", quietly = TRUE)) install.packages("remotes")
# Install PONG2
remotes::install_github("NormanLabUCD/PONG2")Download PONG2_1.0.0.tar.gz from the latest release:
library(PONG2)
#> PONG2: Please add '/home/suraju/.local/bin' to your PATH:
#> export PATH="/home/suraju/.local/bin:$PATH"
#> ** initializing environment
#> PONG2 (Genotype Imputation with Attribute Bagging): v2.0.0
#> Supported by Streaming SIMD Extensions 2 (SSE2)
packageVersion("PONG2")
#> [1] '1.0.1'Pre-phase your data first (see Pre-phasing section), then:
pong2 impute \
--vcf data/chr19.phased.vcf.gz \
-o results/imputed \
-l KIR3DL1 \
-a hg38 \
--fill-missing \
-t 20Note:
--vcf(pre-phased VCF) is the only input required with--fill-missing.
PLINK files cannot hold phased haplotype data — the pipeline derives everything from the VCF.
pong2 --help # General overview + list of commands
pong2 --help impute # Detailed help for imputation
pong2 --help train # Detailed help for training
pong2 version # Show version numberimpute command
| Flag | Description | Example |
|---|---|---|
-i, --bfile |
PLINK bed/bim/fam prefix (normal imputation) | data/chr19 |
--vcf |
Pre-phased VCF file (required with --fill-missing) |
data/chr19.phased.vcf.gz |
-o, --output |
Output directory (created if it doesn’t exist) | results/imputation |
-l, --locus |
KIR locus to impute | KIR3DL1 |
-a, --assembly |
Genome build |
hg19 or hg38
|
Note:
-iand--vcfare mutually exclusive: - Normal imputation: use-i(PLINK bfile) ---fill-missing: use--vcfonly (PLINK derived internally from VCF)
train command
| Flag | Description | Example |
|---|---|---|
-i, --bfile |
Reference PLINK bed/bim/fam prefix | data/chr19 |
-k, --kfile |
CSV with sample IDs and phased KIR allele calls | data/kir_calls.csv |
-o, --output |
Directory to save trained model | models/KIR3DL1 |
-l, --locus |
KIR locus to train | KIR3DL1 |
-a, --assembly |
Genome build |
hg19 or hg38
|
| Flag | Default | Description |
|---|---|---|
-t, --threads |
4 |
Number of CPU threads |
--nclassifier |
100 |
Number of ensemble classifiers |
--split |
0.7 |
Train/validation split proportion |
--kirmaf |
0.00 |
Minimum KIR allele frequency filter |
--mac |
3 |
Minimum allele count for SNPs |
-r, --region |
Optimized default | Custom KIR region (e.g. 55281035-55295784) |
evaluate command
Evaluate a trained model against the held-out validation set directly from the terminal:
| Flag | Description | Example |
|---|---|---|
--model-dir |
Directory containing trained model files | models/KIR3DL1 |
-l, --locus |
KIR locus to evaluate | KIR3DL1 |
--threshold |
Minimum confidence threshold for calls | 0.5 |
Note: Requires
--split < 1during training to generate held-out test data.
Pre-phasing is required before using
--fill-missing. Use Eagle2 to phase your chr19 data:
NOTE: KIR Region SNP Overlap between input data and 1KGP
Overlap rate is computed between your input data and the 1000 Genomes Project (1KGP) reference panel in the KIR region.
Overlap Rate Status Action ≥ 50% Pass Proceed with PONG2 directly < 50% Fail Run Eagle2 + minimac4 pre-imputation first
# Step 1: Pre-phase with Eagle2
eagle \
--bfile=chr19 \
--geneticMapFile=genetic_map_hg19.txt.gz \
--outPrefix=chr19.phased \
--chrom=19 \
--numThreads=20 \
--bpStart=55000000 \
--bpEnd=55400000
# Step 2: Run PONG2 with --fill-missing (VCF only — no -i needed)
pong2 impute \
--vcf chr19.phased.vcf.gz \
-o results/imputed \
-l KIR3DL1 \
-a hg19 \
--fill-missing \
-t 20Pre-impute your chr19 data using a public server before running PONG2:
Step 1: Phase chr19 with Eagle2 (see above)
Step 2: Upload phased VCF to Michigan Imputation Server or TOPMed (recommended for diverse populations)
Step 3: Download imputed VCF and convert to PLINK:
Step 4: Run PONG2: