Calculating Odds Ratios in R

BY IN Code, R, Tutorials NO COMMENTS YET , , , ,

The odds ratio (OR) quantifies how strongly the presence or absence of a property, such as having a particular single nucleotide polymorphism (SNP), is associated with the presence or absence of a characteristic, such as having a disease, in a given population. If the OR is greater than 1, then having the SNP is considered to be associated with having the disease ie. having the SNP increases the risk of disease. If the OR is less than 1, then having the SNP is considered to be associated with not having the disease ie. having the SNP is protective against the disease. Odds ratios are therefore commonly used in genome wide association studies (GWAS).

For more information about odds ratios, see:
The original paper: Altman DG (1991) Practical statistics for medical research. London: Chapman and Hall.
Or this nice tutorial: http://www.medcalc.org/calc/odds_ratio.php

This R code snippet shows you how to calculate odds ratios from a 2×2 contingency table at a given confidence level.

Code:

odds.ratio <- function(t, conf.level) {
    a <- t[1,1]
    b <- t[1,2]
    c <- t[2,1]
    d <- t[2,2]
    OR <- (a/b)/(c/d)
    SE <- sqrt(1/a + 1/b + 1/c + 1/d)
    z <- qnorm((1-conf.level)/2, lower.tail=F)
    return(c(exp(log(OR) - z*SE), exp(log(OR) + z*SE)))
}

Example:

## Data frame with some dummy data
set.seed(0)
d <- data.frame(
Phenotype=factor(rep(1:2, c(53,100-53)), labels=c("Disease","Control")),
SNP=factor(sample(rep(1:2, c(31,100-31))), labels=c("has SNP","no SNP"))
)

## Compute contingency table
t <- table(d)
#          SNP
# Phenotype has SNP no SNP
#  Disease      18     35
#  Control      13     34

## Compute odds ratios for different confidence levels
odds.ratio(t, 0.99)
# [1] 0.4370504 4.1395064
odds.ratio(t, 0.95)
# [1] 0.5718198 3.1638860
odds.ratio(t, 0.9)
# [1] 0.6561204 2.7573795
## Since these intervals contain 1, this SNP is not associated with the disease

## Alternatively
d <- data.frame(
Phenotype=factor(c(2,rep(1:2, c(53,99-53))), labels=c("Disease","Control")),
SNP=factor(rep(1:2, c(31,100-31)), labels=c("has SNP","no SNP"))
)
t <- table(d)
#         SNP
# Phenotype has SNP no SNP
#  Disease      30     23
#  Control       1     46
odds.ratio(t, 0.99)
# [1]   4.033249 892.580639
## This SNP is strongly associated with increased risk of the disease

So, what do you think ?