Combining score tests and Wald tests in a meta-analysis

The test usually provided in the summary of a regression analysis is the Wald test:

Most GWAS programs compute a score test: the null model includeing covariates (like age, sex, principal component scores) because only has to be fitted once, and the program can then loop over SNPs to calculate a score test for each one. This is much faster than fitting a regression model for each SNP tested, especially when using a linear mixed model to allow for relatedness. When the genotypes are not scored directly but imputed as a probabilities, the score test can be extended to allow for this uncertainty by averaging over the posterior distribution of the genotypes. This “missing-data likelihood” score test is implemented in SNPTest.

The score test is based on calculating at the null value (\(\beta = 0\)):

In a large sample, the log-likelihood is asymptotically quadratic (this holds exactly if the likelihood is gaussian). The Wald test and the score test are then algebraically equivalent, and either one can be calculated from the other:-

Meta-analysis

If you have a mixture of studies reported as score tests and studies reported as maximum likelihood estimate \(\hat{\beta}\) and standard error \(s\), there are two alternative ways to combine them in a meta-analysis.
algebraically equivalent.

  1. Calculate the meta-analysis using score and information
  • Convert any studies reported as maximum likelihood estimates and and standard errors to corresponding values of score \(U\) and information \(V\).

  • Then sum the score and information over the different studies, and calculate the test statistic as the score divided by the square root of the information.

or

  1. Calculate the meta-analysis using maximum likelihood estimates and standard errors
  • Convert any studies reported as score and information to corresponding values of maximum likelihood estimate \(\hat{\beta}\) and standard error \(s\)

Then calculate

  • weighted average of the maximum likelihood estimates as \(\frac{\sum{\hat{\beta}_i / s_i^2}}{\sum{1 / s_i^2}}\), where the weights are the inverse variances

  • standard error of this weighted average as \(1 / \sqrt{ \sum{1/s_i^2} }\)

  • Calculate the test statistic \(Z\) as the weighted average estimate of \(\beta\) divided by its standard error.

Score tests based on the missing-data likelihood to allow for uncertain imputation of genotypes

To test for association with imputed genotypes, we should allow for genotype uncertainty. A useful algorithm for constructing tests in this situation is a score test based on the missing-data likelihood. The first use of this algorithm in genetics was by David Clayton to allow for uncertainty in imputing segregation indicators in family-based association studies.

For any realization of the posterior distribution of the missing data, we can calculate the complete data score \(U\) and the information \(V\) by summing over all observations. Standard results [Dempster et al. 1977] yield the observed score as the posterior expectation of \(U\), the missing information as the posterior variance of \(U\), and the complete information as the posterior expectation of \(V\). The observed information is calculated by subtracting the missing information from the complete information. A useful by-product of this algorithm is that the ratio of observed to complete information (proportion of information extracted) can be used to assess the efficiency of the study in relation to an ideal design in which no data are missing (in this context, where all genotypes are typed or imputed with certainty).

This algorithm is implemented in the program SNPTEST, for linear or logistic regression with unrelated individuals and genotypes represented as probabilities.