msigdbr: MSigDB Gene Sets for Multiple Organisms in a Tidy Data Format

CRAN Travis Build Status codecov

Overview

The msigdbr R package provides Molecular Signatures Database (MSigDB) gene sets typically used with the Gene Set Enrichment Analysis (GSEA) software:

Details and examples are described in the vignette.

Installation

The package can be installed from CRAN.

install.packages("msigdbr")

Usage

Load package.

library(msigdbr)

Check the available species.

msigdbr_show_species()
#>  [1] "Bos taurus"               "Caenorhabditis elegans"   "Canis lupus familiaris"  
#>  [4] "Danio rerio"              "Drosophila melanogaster"  "Gallus gallus"           
#>  [7] "Homo sapiens"             "Mus musculus"             "Rattus norvegicus"       
#> [10] "Saccharomyces cerevisiae" "Sus scrofa"

Retrieve all human gene sets.

m_df = msigdbr(species = "Homo sapiens")
head(m_df)
#> # A tibble: 6 x 9
#>   gs_name    gs_id gs_cat gs_subcat human_gene_symb… species_name entrez_gene gene_symbol
#>   <chr>      <chr> <chr>  <chr>     <chr>            <chr>              <int> <chr>      
#> 1 AAACCAC_M… M126… C3     MIR       ABCC4            Homo sapiens       10257 ABCC4      
#> 2 AAACCAC_M… M126… C3     MIR       ACTN4            Homo sapiens          81 ACTN4      
#> 3 AAACCAC_M… M126… C3     MIR       ACVR1            Homo sapiens          90 ACVR1      
#> 4 AAACCAC_M… M126… C3     MIR       ADAM9            Homo sapiens        8754 ADAM9      
#> 5 AAACCAC_M… M126… C3     MIR       ADAMTS5          Homo sapiens       11096 ADAMTS5    
#> 6 AAACCAC_M… M126… C3     MIR       AGER             Homo sapiens         177 AGER       
#> # ... with 1 more variable: sources <chr>

Retrieve mouse hallmark collection gene sets.

m_df = msigdbr(species = "Mus musculus", category = "H")
head(m_df)
#> # A tibble: 6 x 9
#>   gs_name    gs_id gs_cat gs_subcat human_gene_symb… species_name entrez_gene gene_symbol
#>   <chr>      <chr> <chr>  <chr>     <chr>            <chr>              <int> <chr>      
#> 1 HALLMARK_… M5905 H      ""        ABCA1            Mus musculus       11303 Abca1      
#> 2 HALLMARK_… M5905 H      ""        ABCB8            Mus musculus       74610 Abcb8      
#> 3 HALLMARK_… M5905 H      ""        ACAA2            Mus musculus       52538 Acaa2      
#> 4 HALLMARK_… M5905 H      ""        ACADL            Mus musculus       11363 Acadl      
#> 5 HALLMARK_… M5905 H      ""        ACADM            Mus musculus       11364 Acadm      
#> 6 HALLMARK_… M5905 H      ""        ACADS            Mus musculus       11409 Acads      
#> # ... with 1 more variable: sources <chr>