Train KIR genotype prediction models using parallel attribute bagging across multiple CPU cores. This is the core training function used by the pong2 train CLI command.

kirParallelAttrBagging(
  cl,
  hla,
  snp,
  auto.save = "",
  nclassifier = 100,
  mtry = c("sqrt", "all", "one"),
  prune = TRUE,
  rm.na = TRUE,
  stop.cluster = FALSE,
  verbose = TRUE
)

Arguments

cl

a cluster object created by parallel::makeCluster() for parallel computation across multiple CPU cores

hla

a KIR allele table object of class hlaAlleleClass containing training allele calls

snp

a SNP genotype object of class hlaSNPGenoClass containing training SNP data

auto.save

character string; file path prefix for auto-saving classifiers during training. Use "" (default) to disable

nclassifier

integer; number of individual ensemble classifiers to train (default: 100)

mtry

character; number of SNPs randomly selected at each node. One of "sqrt" (default), "all", or "one"

prune

logical; if TRUE (default), prune classifiers

rm.na

logical; if TRUE (default), remove samples with missing KIR allele calls

stop.cluster

logical; if TRUE, stop the parallel cluster after training (default: FALSE)

verbose

logical; if TRUE (default), print progress

Value

An object of class hlaAttrBagClass representing the trained PONG2 KIR prediction model. The object contains:

n.samp

integer; number of training samples

n.snp

integer; number of SNP predictors used

hla.locus

character; the KIR locus name

hla.allele

character vector; KIR alleles in the model

classifiers

list; individual ensemble classifiers

out.of.bag.acc

numeric; out-of-bag accuracy estimate

Use kirPredict() to apply the model to new samples.

Examples

# Load example data
data(PONG2_example)
#> Warning: data set ‘PONG2_example’ not found

# Set up parallel cluster
cl <- parallel::makeCluster(2)

# Train a small model
model <- kirParallelAttrBagging(
  cl          = cl,
  hla         = example_kir,
  snp         = example_snp,
  nclassifier = 20,
  verbose     = FALSE
)

parallel::stopCluster(cl)

# View model summary
print(model)
#> Gene: KIR3DL1
#> Training dataset: 50 samples X 200 SNPs
#> 	# of KIR3DL1/S1 alleles: 5
#> 	# of individual classifiers: 20
#> 	total # of SNPs used: 133
#> 	average # of SNPs in an individual classifier: 9.95, sd: 2.39, min: 6, max: 15
#> 	average # of haplotypes in an individual classifier: 127.70, sd: 72.24, min: 46, max: 319
#> 	average out-of-bag accuracy: 62.16%, sd: 6.64%, min: 52.63%, max: 75.00%
#> Genome assembly: hg19

# Clean up
hlaClose(model)