Gene Sets as R Environments

R Environments provide a useful data structure for organizing gene sets. In particular, because objects don’t lie in the same environment, the same name can be used to point to different objects depending on the environment. Therefore, one environment can store the list of genes in pathway as references by gene ontology codes while another environment can store the name of the pathway as reference by the same gene ontology code without clashing. Similarly, one environment can store all the pathways that contain a particular gene according to one database and another environment can store all the pathways that contain a particular gene according to a different database.

For example:

> load('GO_Hs_GS.RData') ## load stored environment
> gos <- ls(GO.Hs.GS2Symbol) ## list the gene sets stored in the environment
[1] "GO:0000002" "GO:0000012" "GO:0000018" "GO:0000030" "GO:0000038"
[6] "GO:0000041"
> genes <- get(gos[1], env=GO.Hs.GS2Symbol) ## get the genes in the gene set
> genes
[1] "AKT3"    "C10orf2" "DNA2"    "MEF2A"   "MPV17"   "PID1"    "SLC25A4"
[8] "TYMP"
> get(gos[1], env=GO.GS2Name) ## get the name of the gene set
[1] "mitochondrial genome maintenance (BP)"


Here is an example that creates an R environment gene set from Broad Institute’s MSigDB:

## read in Broad gmt format
library(cnvGSA)
filename <- 'msigdb.v4.0.symbols.gmt'

## number of gene sets
n <- length(gs$gs2gene) ## create environment env <- new.env(parent=globalenv()) invisible(lapply(1:n,function(i) { genes <- as.character(unlist(gs$gs2gene[i]))
name <- as.character(unlist(gs\$gs2name[i]))
assign(name,genes,envir=env)
}))

MSigDB.Hs.GS2Symbol <- env

## double check
gs.test <- ls(env=MSigDB.Hs.GS2Symbol)
gs.test[1] ## category 1
gl.test <- get(gs.test[1], env=env)
gl.test ## list of genes in gene set 1

## save environment
save(MSigDB.Hs.GS2Symbol, file="MSigDB_Hs_GS.RData")


To test this tutorial for yourself and get more gene sets, check out on GitHub: https://github.com/JEFworks/genesets