User options

The program requires a list of options to be specified by the user either as command-line arguments, or in a text file the name of which is given as a single argument to the program. As explained above, the most convenient way to specify these arguments is to use a Perl script (see "admixmap.pl").

If the options are specified in a text file, they must be indicated as optionname=optionvalue.

General Options
Allele / Haplotype Frequency Model
Data Files
Model Specification
Prior Specification
Output Files
Tests and Diagnostics

1. General Options

samples

Integer specifying total number of iterations of the Markov chain, including burn-in. With strong priors and informative markers, a run of about 500 should suffice for inference. Otherwise, a run of at least 20 000 iterations may be necessary. See here for how to determine if the run has been long enough.

burnin

Integer specifying number of iterations for burn-in of the Markov chain, before posterior samples are output. A burn-in of at least 50 iterations is recommended for inference. For analyses requiring long runs, a burn-in of up to 500 may be required.

every

Integer specifying the "thinning" of samples from the posterior distribution that are written to the output files, after the burn-in period. For example, if every=10, sampled values are written to the output files every 10 iterations. We recommend using a value of 5 to keep down the size of the output files. Sampling more frequently than this does not much improve the precision of results, because successive draws are not independent. Thinning the output samples does not affect the calculation of ergodic averages or test statistics, which are based on all sampled values.

Note that every must be no greater than(samples - burnin) / 10 or some output files may be empty.

numannealedruns

If thermo=0, this specifies the number of "annealing" runs during burnin. This usually improves mixing.

If thermo=1, this specifies the number of "temperatures" at which to run in order to estimate the marginal likelihood by thermodynamic integration.

Default is 20.

displaylevel

0 - silent mode; Only start and finish times output to screen.

1 - quiet mode; Model specification, priors, test results and diagnostics written to screen.

2 - normal mode; more verbose information and an iteration counter output to screen.

>2 - monitor mode; population-level parameters also written to screen with frequency specified by every.

resultsdir

Path of directory for output files. Default is 'results'.

logfile

Name of log file written by the program. Default is 'logfile.txt',

seed

Sets a seed for the random number generator.

2. Options for allele frequency Model

The program requires one of the following three options, any one of which specifies the number of subpopulations in the model: populations, priorallelefreqfile, or historicallelefreqfile. These options are mutually exclusive.

The allelefreqfile option is no longer supported: instead, specify the options priorallelefreqfile and fixedallelefreqs=1, and supply the allele frequencies in priorallelefreqfile format (without adding 0.5 to each cell).

populations

Integer specifying number of subpopulations that have contributed to the admixed population under study. If specified as 1, the program fits a model based on a single homogeneous population. This option is not required (and is ignored) if information about allele frequencies is supplied in allelefreqfile, priorallelefreqfile, or historicallelefreqfile, as the number of columns in any of these files defines the number of subpopulations in the model. If none of these files are specified, the parameters of the Dirichlet priors for allele or haplotype frequencies default to 1/n, where n is the number of alleles or haplotypes at each compound locus.

priorallelefreqfile

When this option is specified, the program fits a model in which the allele frequencies in each subpopulation are estimated simultaneously from the unadmixed samples and the admixed sample under study.

This file contains parameter values for the Dirichlet prior distribution of the allele or haplotype frequencies at each compound locus in each subpopulation. At each compound locus with S alleles or S possible haplotypes, a Dirichlet prior distribution is specified by a vector of S positive numbers. Where these alleles or haplotypes have been counted directly in samples from an unadmixed subpopulation, the parameter values should be specified as 0.5 plus the observed counts of each allele (this is equivalent to combining the counts with a reference prior). Where no information is available about allele or haplotype frequencies at a compound locus in a subpopulation, or no copies of the allele have been observed in the sample from that subpopulation, specify 0.5 in the corresponding cells. Specifying 0.5 in all cells, with columns for K subpopulations, is equivalent to specifying the option populations = K.
Where haplotype frequencies at a compound locus have been estimated from unordered genotypes, the user should supply the parameters of the Dirichlet distribution that most closely approximates the posterior distribution of haplotype frequencies given the observed genotypes and a reference prior, as described above. The first row is a header row, consisting of strings in quotes, separated by spaces. The first string in this row is ignored, and the subsequent strings specify the names of the ancestral subpopulations contributing to the admixed population).
After the header row, there is one row for each allele (or haplotype) at each compound locus. The first column in each row gives the name of the compound locus in quotes. Subsequent columns give the prior parameters for the frequency of the allele (or haplotype) in each subpopulation, separated by a single space.
If the compound locus consists of two or more simple loci, (see notes above), the rows list prior parameters for the haplotypes in the order defined by incrementing a counter from right to left. For instance if there were three loci A, B, C, with 4, 2 and 3 alleles respectively, the haplotypes would be listed in the following order: 1-1-1, 1-1-2, 1-1-3, 1-2-1, 1-2-2, 1-2-3, 2-1-1, ..., 4-2-2, 4-2-3. Estimated counts should be given for all possible haplotypes, however rare. The program will include all possible haplotypes in the model, but will omit rare haplotypes when constructing test statistics.

historicallelefreqfile

This file contains observed counts of alleles or haplotypes at each compound locus in samples from unadmixed subpopulations. When this option is specified, the program fits a model that allows the "historic" allele frequencies in the unadmixed population to vary from the corresponding ancestry-specific allele frequencies in the admixed population under study.

The format of this file is exactly the same as the format of priorallelefreqfile described above. The only difference between the two files is that in historicallelefreqfile, 0.5 is not added to the observed counts.

3. Options that specify data files

locusfile

This file contains information about each simple locus: that is, each locus that is typed. The first row of the file is ignored by the program, and can be used as a header. Each subsequent row contains values of four variables: locus name; number of alleles at this locus; genetic map distance in Morgans, centimorgans or megabases between this locus; and the previous locus and the name of the chromosome where the locus is located. The last column is optional if none of the loci lie on the X chromosome. If distances are supplied in centimorgans, the header of the distance column must contain "cm" or "cM". If the distances are supplied in megabases, the header should contain "mb" or "Mb". Loci must be ordered by their map positions on the genome. Locus names should contain only alphanumeric characters (no spaces, dots or hyphens). If the previous locus is unlinked, the genetic map distance should be coded as "NA", "#" or ".". Loci considered too far apart to be linked may also be treated as unlinked. For two or more loci that are so close together that they should be analysed as a single compound locus (as with DRD2Bcl and DRD2Taqd in the tutorial), map distance should be coded as 0.
The webpage http://actin.ucd.ie/cgi-bin/rs2cm.cgi can be used to obtain the genetic map positions (in cM) of a list of SNPs, which, once converted to distances, may be specified in the locusfile.

genotypesfile

Specifies the path to a file containing genotypes for each individual typed. The first row of the file is a header row listing locus names, enclosed in quotes and separated by spaces. Locus names should be exactly the same as in the file locusfile. Loci must be ordered by their map positions on the genome. Each subsequent row contains genotype data for a single individual. Each line contains the individual ID, the individual's sex, coded as 1 for male, 2 for female, 0 for missing, followed by observed genotypes at each locus, optionally enclosed in quotes. The sex column may be omitted if none of the loci are on the X chromosome. Haploid genotypes (including X chromosome genotypes for males) are coded as single integers. Diploid genotypes are coded as pairs of integers separated by a comma. Where there are a alleles at a locus, the alleles should be coded as numbers from 1 to a. Missing genotypes are coded as "0,0" (or "0" for haploid genotypes).

For compatibility with existing datasets, we plan to change this file format to one more similar to the PEDFILE format used with LINKAGE.

outcomevarfile

This file contains values of one or more outcome variables. After the header row, the file has one row per individual. Binary variables should be coded as 1 = affected, 0 = unaffected. Missing values are coded as #. The header row contains the variable labels in quotes separated by spaces. If the file contains more than one outcome variable, the column(s) containing the variable(s) of interest should be specified by the outcomevarcols option, otherwise all columns are used. For example, 'outcomevarcols=1,3' uses the first and third columns.

coxoutcomevarfile

This file contains survival data for a Cox regression model. After the header row, the file has one row per individual and there are three columns. The first contains the times when each individual began to be observed; the second contains the times the individuals ceased being observed and the last column contains the number of events that occurred during the observed period (usually 0 or 1). The start and finish times must be numeric and relative to the same point in time, (usually the first start time).

covariatesfile

This file contains values of covariates to be included in the regression model. It is used only if an outcomevarfile has been specified, and is optional even then. The header row contains covariate names in quotes, separated by spaces. Subsequent rows contain the observed values of these variables. For computational reasons, the values of the covariates should be centred about their sample means. Missing values are coded as #. If the file contains more than one covariate, the column(s) containing the covariate(s) of interest should be specified by the covariatecols option, otherwise all columns are used.

outcomevarcols

Valid only with outcomevarfile.

Vector of integer specifying the columns of the outcomevarfile to use. For example, "outcomevarcols = 3, 1" specifies to regress on the third and first variables, in that order. If not specified, all columns are used.

covariatecols

Valid only with both outcomevarfile and covariatesfile.

Vector if integers specifying the columns of the covariatesfile to use. If not specified, all columns are used.

reportedancestry

Not fully tested or documented: allows prior information about each individual’s ancestry to be specified in the model.

testgenotypesfile

This file contains genotypes for each individual in the genotypesfile at diallelic loci not included in the model due to large haplotypes not being modelled. The format is the as for the genotypesfile above except that genotypes should be coded as 0 for "1,1", 1 for "1,2" and 2 for "2,2". Missing genotypes should be coded as NA. The file is not used by the program itself but will indicate that, provided there is a regression model,"offline" score tests are to be carried out in the R script.

4. Model Specification

indadmixhiermodel

0 - Model for a collection of individuals in which the admixture proportions of each individual's parents, and the sum of intensities on each parental gamete, are statistically independent given the priors on these parameters.

This option is useful in two situations: (1) when you already have strong prior information about the distribution of admixture in the population from which the individuals have been sampled, and want to specify a Dirichlet prior for each individual’s parental admixture proportions using the option initalpha0; or (2) when you want to calculate the marginal likelihood of the model given the genotype data on each individual.

1 (default) - Hierarchical model on individual admixture

randommatingmodel

0 (default) - assortative mating model (admixture proportions the same in both parents)

1 - random mating model

globalrho

0 - the sum of intensities parameter r is allowed to vary between individuals, or between gametes if a random mating model is specified). This specifies a hierarchical model, with a gamma distribution for the variation of r between individuals specified as below.

1 (default) - the sum of intensities r is modelled as a global parameter, set to be the same on all parental gametes

globalpsi

0 - individual-level odds ratio female/male between ancestral populations (psi)

1 (default) - population level odds ratio psi

The prior parameters on log psi are set through the 'oddsratiosprior' option.

fixedallelefreqs

1 specifies that priorallelefreqfile contains fixed allele frequencies

0 (default) otherwise

correlatedallelefreqs

valid only with 'populations' or 'priorallelefreqfile' options

1 specifies a correlated allele frequency model

0 (default) otherwise

poplabels

A list of strings specifying the labels for the subpopulations in the model. For example, 'poplabels = Afr, Eur'. It is ignored unless the populations options is used and must have length equal to the value of populations.

5. Prior Specification

sumintensitiesprior, globalsumintensitiesprior

In a model with global sumintensities or without a hierarchical model of individual admixture, the sum of intensities parameter has a Gamma(a, b) prior specified as " globalsumintensitiesprior="a,b" ". Default values for a and b are 3 and 0.5, giving a prior mean of 6 and prior variance of 12.

Otherwise (indadmixhiermodel=1 and globalrho=0), the sum of intensities parameter r has a Gamma(a,b) prior distribution and the scale parameter b has a beta hyperprior with parameters b0 and b1. This specifies a "GammaGamma" prior, which has mean

E(r) = ab₁ / (b₀ - 1) and variance E(r)(E(r)+1) / (b₀-2).

The three parameters of this prior are specified with sumintensitiesprior. The three values must be enclosed by quotes and separated by commas e.g "sumintensitiesprior="2,3,4"".

Thus, for instance, to model an African-American population, for which we have prior information that the sum of intensities parameter is about 6 per morgan, we could specify

sumintensitiesprior = "6,40,39"

This specifies the prior for the sum of intensities parameter r as Gamma(6, 1) which has mean 6 and variance 1.

"0,1,0" specifies a flat prior on log r

"1,1,0" specifies a flat prior on r

The default, if this option is not specified, is "4,3,3"

oddsratioprior

In a model with X chromosome, this option allows to set the prior on the log of the odds ratio parameter psi. If we are modelling a population-level odds ratio female/male founders by setting the 'globalpsi=1' option, we can specify

oddsratiosprior = a, b

where a and b are the mean and precision (inverse variance) of a Gaussian prior on the natural log of the odds ratio. So, setting a=2 and b=100 will specify that log psi is expected to be close to 2, which corresponds to an odds ratio of e². The value for the precision must be at least 0.1.

For a model with individual-level odds ratios (where 'globalpsi=0'), the option requires four arguments

oddsratiosprior = a, b, c, d

where a and b have the same meaning as above, while c and d are the shape and rate parameters of a Gamma prior on the psi precision (c and d must be positive).

etapriormean, etapriorvar

Specify the prior mean and variance of the dispersion parameter(s) in a dispersion or correlated allele frequency model.

etapriorfile

File containing parameters of the gamma prior distribution specified for the allele frequency dispersion parameter h in each subpopulation. This option can be used only when a dispersion model has been specified with the option historicallelefreqfile. This is useful when there are not enough data for the dispersion parameter to be inferred from the data, and we want to use prior information from population genetics.

This file has one row for each subpopulation (in the same order as the order of subpopulations by columns in historicallelefreqfile, and two columns specifying the shape and location parameters of the gamma distribution. Thus, for a sample from an African-American population, in which historicallelefreqfile contains counts of alleles in samples of modern west Africans (in the first column) and Europeans (in the second column), we might specify an etaprior file containing these two lines:

50 1

500 1

This specifies a prior with mean 50 for the parameter for dispersion of allele frequencies between modern unadmixed west Africans and the African gene pool in African-Americans, and a prior with mean 500 and variance 500 for the parameter for dispersion of allele frequencies between modern unadmixed Europeans and the European gene pool in African-Americans.

The dispersion parameter is related to the fixation index FST by

x = (1 + FST) / FST, so values of 50 and 500 for x correspond roughly to values of 0.02 and 0.002 for FST.

admixtureprior, admixtureprior1

When indadmixhiermodel = 0, each of these two options can be used to specify a Dirichlet parameter vector for parental admixture proportions. The parameter vector is specified as a string of numbers separated by commas. For instance, with a model based on 3 subpopulations:-

admixtureprior = "2, 8, 3.5"

would specify the prior for parental admixture proportions (or the maternal gamete if the option 'randommatingmodel=1' has been specified) with parameter vector c(2, 8, 3.5).

admixtureprior1 can be used similarly to specify the prior for paternal admixture proportions if the option 'randommatingmodel=1' has been specified.

For example, "admixtureprior = 1,1,0" and "admixtureprior1 = 1,1,1" would specify that one parent has 2-way admixture (between subpopulations 1 and 2) and the other has 3-way admixture between subpopulations .

If indadmixhiermodel =1, admixtureprior can be used to specify initial values for the population admixture Dirichlet parameters.

regressionpriorprecision

Specifies the prior precision (1 / variance) of the regression parameters.

popadmixproportionsequal

Specifies that the population-level admixture proportions are to be kept equal.

6. Output Files

Output files are written to the directory specified by resultsdir.

An R script (AdmixmapOutput.R) is supplied that processes these output files to produce tables of posterior quantiles, frequency plots of the posterior distributions, convergence diagnostics and plots of the cumulative posterior means. The R script also calculates a summary slope parameter for the effect of admixture from each subpopulation, versus the others. This R script is run automatically from the Perl script (admixmap.pl) that is supplied as a wrapper for the program

args.txt is a list of the options used by the program. This is used by the R script to identify output files and other information.

paramfile

Posterior draws of the following at intervals determined by option every:

Parameters of the Dirichlet distribution for parental admixture: one for each subpopulation
Sum of intensities for the stochastic process of transitions of ancestry on hybrid chromosomes

regparamfile

Posterior draws of intercept, slope and precision (the inverse of the residual variance) parameters in the regression model, at intervals determined by option every.

dispparamfile

Posterior draws of allele frequency dispersion parameters, one for each subpopulation, at intervals determined by option every. These are written only if option historicallelefreqfile has been specified or correlatedallelefreqs = 1.

Median and 95% credible intervals for these parameters are written to the file PosteriorQuantiles.txt.

indadmixturefile

Posterior draws of individual/gamete level variables, at intervals determined by option every written as an R object. The outputs to this file are, in the following order;

gamete admixture proportions, ordered by subpopulations and then by gamete if a random mating model is specified. If an assortative mating model is specified only individual admixture proportions will be output.
gamete/individual sum-of-intensities if globalrhoindicator is false.
predicted value of the outcome variable in the regression model.
paternal and maternal haplotypes at this locus.

These values are written out for every individual at every iteration This file is formatted to be read into R as a three-way array (indexed by variables, individuals, draws).

indadmixmodefile

Name of output file containing posterior estimates of the modes of individual admixture proportions and individual-level sumintensities (if globalrho=0).

allelefreqoutputfile

Posterior draws of the ancestry-specific allele or haplotype frequencies for each state of ancestry at each compound locus, at intervals determined by option every. Valid only when the allele frequencies are specified as random variables, i.e. when one of the two options priorallelefreqfile or historicallelefreqfile is specified and fixedallelefreqs is 0. These results can be used to compute new parameters for the prior distributions specified in priorallelefreqfile which can be used in subsequent studies with independent samples

ergodicaveragefile

Cumulative posterior means over all iterations ("ergodic averages") for the variables in paramfile, regparamfile and dispparamfile as well as the mean and variance of the deviance, output at intervals of 10 every iterations. Monitoring these ergodic averages allows the user to determine whether the sampler has been run long enough for the posterior means to have been estimated accurately.

locusancestryprobs

Posterior marginal probabilities of each ancestry state (0, 1, or 2 copies from k-th population) at each locus in each individual. The output file name is LocusAncestryPosteriorProbs.txt. The file is formatted as an R object. To read it into R, use the command

dget(file="LocusAncestryPosteriorProbs.txt")

This will create a 4-way array indexed by individuals, loci, populations, ancestry states. You can use this to construct tests for allelic association stratified by locus ancestry, or to test for allelic association conditional on locus ancestry.

7. Tests and Diagnostics

The options below specify additional tests or output,but do not change the model itself.

Model diagnostics

chib

Set to1 to calculate the marginal likelihood for the first individual using the Chib algorithm.

thermo

Set to 1 to use thermodynamic integration to compute marginal likelihood.

testoneindiv

Set to 1 to compute the marginal likelihood for the first individual listed in the genotypes file. This individual will not be included as part of the sample and should not be included in an outcomevarfile or covariatesfile.

stratificationtest

Set to 1 to perform a test for residual population stratification (stratification not accounted for by the fitted model).

dispersiontest

Set to 1 to perform tests for dispersion of allele frequencies between the unadmixed populations sampled and the corresponding ancestry-specific allele frequencies in the admixed population under study. This is evaluated for each subpopulation at each locus, and as a global test over all loci. This option is valid only if option priorallelefreqfile is specified. The results are "Bayesian p-values", as above.

fstoutput

This option is used only with option historicallelefreqfile (which specifies a dispersion model for allele frequencies). Under a dispersion model, the allele frequencies in unadmixed modern descendants are allowed to vary from the corresponding ancestry-specific allele frequencies in the admixed population. The variance of allele frequencies at a locus can be measured by Wright's "fixation index subpopulation-total" (F_st). In Wright's terminology, the unadmixed modern descendants and the pool of genes of corresponding ancestry in the admixed population are "subpopulations", and the "historic" population from which both these gene pools are derived is the "total" population. This differs from the terminology used in this manual, in which K "subpopulations" are specified in the model as ancestors of the admixed population.
For each locus, and each subpopulation, specifying the option fstoutput=1 will make the program output the ergodic average of the F_st value. These values can be examined as a diagnostic: a locus with an unusually large F_st value may indicate errors in coding, errors in typing, or possibly that allele frequencies in unadmixed modern descendants have diverged from the corresponding allele frequencies in the admixed population as a result of recent selection pressure.

Score tests

The allelicassociationtest, haplotypeassociationtest, ancestryassociationtest, affectedsonlytest and residualldtest each produce two files containing results of score tests obtained by averaging over the posterior distribution: a p-value file and a final table. The admixtureassoctest, allelefreqtest and hwtest only produce final tables.

The p-values, based on cumulative averages for the score and information over all posterior samples obtained after the burn-in period, are output at intervals of 10 × every. Monitoring these repeated allows the user to determine when the sampler has been run long enough for the test results to be computed accurately. These files are formatted to be read into R as arrays.

The final tables, which are based on the entire posterior sample, are used for inference.

For univariate null hypotheses (testing the effect of one allele, one haplotype, or one subpopulation against all others) the test statistic is the score divided by the square root of the observed information, which has a standard normal distribution under the null hypothesis. The percent of information extracted (the ratio of observed information to complete information) measures the information obtained about the parameter under test, in comparison the information that would be obtained if individual admixture, haplotypes at each locus, and gamete ancestry at each locus were measured without error.

For the affected-only and ancestry asociation score tests, the missing information can be partitioned into two components: missing information about locus ancestry, and missing information about model parameters (parental admixture). These components are tabulated separately.

For composite null hypotheses, the score U is a vector, the observed information V is a matrix, and the test statistic (UV^-1U^/) has a chi-squared distribution under the null hypothesis.

admixtureassoctest

Set to 1 to perform a score test for the association of the trait with individual admixture. This option is valid only if an outcome variable has been specified. The null hypothesis is no effect of individual admixture in a regression model, with covariates as explanatory variables if specified. The test statistic is computed for the effect of each subpopulation separately, with a summary chi-square test over all subpopulations if there are more than two subpopulations.

If admixtureassoctest is specified, the regression model will not include individual admixture proportions as explanatory variables, and tests for allelic association or linkage will not be adjusted for the effect of individual admixture.

allelicassociationtest

Set to 1 to perform score tests for association of the outcome variable with alleles at each simple locus, adjusting for individual admixture. The null hypothesis is no effect of the alleles or haplotypes in a regression analysis with individual admixture (and covariates if specified) as explanatory variables. The test statistic is computed for each allele or haplotype separately, with a summary chi-square statistic over all alleles or haplotypes at each locus if there are more than two alleles or haplotypes. Rare alleles or haplotypes are grouped together.
This test is appropriate when testing for association of the trait with alleles or haplotypes in a candidate gene.

haplotypeassociationtest

Set to 1 to perform score tests for association of the outcome variable with haplotypes for all compound loci containing haplotypes, adjusting for individual admixture. Valid only with allelicassociationtest.

residualldtest

Set to 1 to perform score tests for residual allelic association between pairs of unlinked loci.

ancestryassociationtest

Set to 1 to perform score tests at each compound locus for linkage with genes underlying ethnic variation in disease risk or trait values. This is a test for association of the trait with locus ancestry, adjusting for individual admixture and covariates. The null hypothesis is no effect of locus ancestry in a regression analysis with individual admixture (and covariates if specified) as explanatory variables. The test statistic is computed for the effect of each subpopulation separately, with a summary chi-square statistic over all subpopulations at each locus if there are more than two subpopulations. The proportion of information extracted depends upon the information content for ancestry of the marker locus and other nearby loci.

This test is appropriate when the objective of the study is to exploit admixture to localize genes underlying ethnic variation in the trait value, using ancestry-informative markers rather than candidate gene polymorphisms. This test should be used in a cross-sectional or cohort study design. For a case-control study of a rare disease, the affected-only test below has greater statistical power.

affectedsonlytest

Set to 1 to perform score tests for linkage with ancestry at each compound locus, based on comparing the observed and expected proportions of gene copies at each locus that have ancestry from each subpopulation. This test is calculated from affected individuals only: individuals are their own controls. This is the only test that can be used if the sample consists only of affected individuals. Even when the sample includes both cases and controls, this test is more powerful than the regression model score test in ancestryassociationtest if the disease is rare. This is because for a rare disease, the observed and expected proportion of gene copies that have ancestry from the high-risk subpopulation will not differ by very much in unaffected individuals.

In addition to the p-values and final table, this test produces a file called "AffectedsOnlyLikRatios.txt", containing likelihood ratios for the affecteds-only test at values of 0.5 and 2 for the ancestry risk ratio.

allelefreqtest

Set to 1 to perform score tests of mis-specified ancestry specific allele frequencies. This option is valid only when the allele frequencies are fixed, i.e. when option allelefreqfile is specified or fixedallelefreqs is 1. For each compound locus and each subpopulation, a score test is computed for the null hypothesis that the frequencies of all alleles have been specified correctly.A summary ch-squared test over all subpopulations is also computed at each locus.

hwtest

Set to 1 to perform score tests for heterozygosity across loci, as a test for departure from Hardy-Weinberg equilibrium. These can be used to detect genotyping errors.