Train KIR genotype prediction models using parallel attribute bagging
across multiple CPU cores. This is the core training function used
by the pong2 train CLI command.
kirParallelAttrBagging(
cl,
hla,
snp,
auto.save = "",
nclassifier = 100,
mtry = c("sqrt", "all", "one"),
prune = TRUE,
rm.na = TRUE,
stop.cluster = FALSE,
verbose = TRUE
)a cluster object created by parallel::makeCluster() for
parallel computation across multiple CPU cores
a KIR allele table object of class hlaAlleleClass
containing training allele calls
a SNP genotype object of class hlaSNPGenoClass
containing training SNP data
character string; file path prefix for auto-saving
classifiers during training. Use "" (default) to disable
integer; number of individual ensemble classifiers to train (default: 100)
character; number of SNPs randomly selected at each node.
One of "sqrt" (default), "all", or "one"
logical; if TRUE (default), prune classifiers
logical; if TRUE (default), remove samples with
missing KIR allele calls
logical; if TRUE, stop the parallel cluster
after training (default: FALSE)
logical; if TRUE (default), print progress
An object of class hlaAttrBagClass representing the trained
PONG2 KIR prediction model. The object contains:
integer; number of training samples
integer; number of SNP predictors used
character; the KIR locus name
character vector; KIR alleles in the model
list; individual ensemble classifiers
numeric; out-of-bag accuracy estimate
Use kirPredict() to apply the model to new samples.
# Load example data
data(PONG2_example)
#> Warning: data set ‘PONG2_example’ not found
# Set up parallel cluster
cl <- parallel::makeCluster(2)
# Train a small model
model <- kirParallelAttrBagging(
cl = cl,
hla = example_kir,
snp = example_snp,
nclassifier = 20,
verbose = FALSE
)
parallel::stopCluster(cl)
# View model summary
print(model)
#> Gene: KIR3DL1
#> Training dataset: 50 samples X 200 SNPs
#> # of KIR3DL1/S1 alleles: 5
#> # of individual classifiers: 20
#> total # of SNPs used: 133
#> average # of SNPs in an individual classifier: 9.95, sd: 2.39, min: 6, max: 15
#> average # of haplotypes in an individual classifier: 127.70, sd: 72.24, min: 46, max: 319
#> average out-of-bag accuracy: 62.16%, sd: 6.64%, min: 52.63%, max: 75.00%
#> Genome assembly: hg19
# Clean up
hlaClose(model)