Notes on the design of mendelian randomization studies


The basis of the "mendelian randomization" approach is to test hypotheses about the effect of an intermediate phenotype on disease risk by testing for association of disease with genotype at a locus that is known to perturb the intermediate phenotype.  

Typically, this approach requires three different studies to be available

  1. a cross-sectional study from which to estimate the effect of genotype on the intermediate phenotype.  Typically the intermediate phenotype is a gene product (such as fibrinogen) or a metabolite (such as urate) for which candidate gene polymorphisms can readily be identified.  
  2. a cohort study from which to estimate the association of the intermediate phenotype (measured at baseline, or on stored samples obtained at baseline) on disease risk.  The cohort design ensures that measurements of the intermediate phenotype are not biased by disease onset.  
  3. a DNA case-control collection that can be used to test for an effect of genotype on disease risk.   From the effect of genotype on intermediate phenotype, and the association of the intermediate phenotype with disease risk, it is possible to predict the size of effect of genotype on disease risk that should be observed if the association of phenotype with disease risk is causal.  

Statistical power and sample size


The general formula for the size d of effect detectable at given power and Type 1 error probability in test of the null hypothesis that the effect size is zero is

where \alpha and \beta are the Type 1 and Type 2 error probabilities, Z_q is quantile q of the standard normal distribution, and s is the standard error of the effect size

For a logistic regression model (in which the effect is measured as the log odds ratio, s can be calculated from the Fisher information (expectation of minus the second derivative  of the log odds ratio) at the null


where N is the total number of observations, \phi is the probability of being a case, and v is the variance of the predictor variable

For a cohort study testing for association of a rare disease (phi close to 0) with a quantitative trait that is scaled to have variance of 1, we have



where n is the total number of cases yielded by the cohort study.  

For a case-control study with n cases and n controls, testing for an effect on disease risk of genotype (coded as 0, 1, 2) at a SNP with allele frequency p, we have


For allele frequency 0.2, we have .  Thus in this situation the number of cases required for a case-control study to detect the effect (measured as log odds ratio associated with one extra copy of the disease associated allele) is 6.25 times larger than the number of cases required for a cohort study to detect an effect of the same size (measured as log odds ratio associated with change of 1 standard deviation) of a continuous trait on disease risk.  

In practice, we expect the effect of genotype on the intermediate phenotype to be modest: usually no more than 0.5 standard deviations for each extra copy of the trait-raising allele.  Halving the effect size requires a fourfold increase in sample size, so that the case-control collection has to be 25 times larger (in terms of number of cases) than the cohort study.  For a rare disease (cumulative incidence less than 1% at follow-up), the total number of individuals in the case-control collection will still be far less than the total number of individuals in the cohort study.  

This disparity in required sample sizes means that it is not usually feasible to study genotype, intermediate phenotype, and disease outcome in a single cohort as in the classical "instrumental variable" approach used in the social sciences.   For instance, the EPIC study has 400 000 individuals with stored plasma samples obtained at baseline, and about 1000 cases of colon cancer at follow-up.   This is adequate to detect a standardized log odds ratio of about 1.2 for the effect of a continuous trait on disease risk, and more than adequate to test for an effect of genotype on an intermediate trait measured at baseline.   However, unless the effect of genotype on the intermediate trait is unusually large, this is nowhere near the sample size required to exploit "mendelian randomization" to test for a causal relationship between the intermediate trait and disease outcome.   This requires a large case-control DNA collection.