Introducing europepmc, an R interface to Europe PMC RESTful API

Najko Jahn

2020-05-31

What is searched?

Europe PMC is a repository of life science literature. Europe PMC ingests all PubMed content and extends its index with other literature and patent sources.

For more background on Europe PMC, see:

https://europepmc.org/About

Levchenko, M., Gou, Y., Graef, F., Hamelers, A., Huang, Z., Ide-Smith, M., … McEntyre, J. (2017). Europe PMC in 2017. Nucleic Acids Research, 46(D1), D1254–D1260. https://doi.org/10.1093/nar/gkx1005

How to search Europe PMC with R?

This client supports the Europe PMC search syntax. If you are unfamiliar with searching Europe PMC, check out the Europe PMC query builder, a very nice tool that helps you to build queries. To make use of Europe PMC queries in R, copy & paste the search string to the search functions of this package.

In the following, some examples demonstrate how to search Europe PMC with R.

Managing search results

By default, 100 records are returned, but the number of results can be expanded or limited with the limit parameter.

europepmc::epmc_search('"Human malaria parasites"', limit = 10)
#> # A tibble: 10 x 27
#>    id    source pmid  doi   title authorString journalTitle pubYear journalIssn
#>    <chr> <chr>  <chr> <chr> <chr> <chr>        <chr>        <chr>   <chr>      
#>  1 3247… MED    3247… 10.1… "C-t… Kimata-Arig… J Biochem    2020    "0021-924x…
#>  2 PPR1… PPR    <NA>  10.1… "A d… Cobb DW, Ku… <NA>         2020     <NA>      
#>  3 3192… MED    3192… 10.1… "Fal… Rosenthal P… Biochim Bio… 2020    "1570-9639…
#>  4 PPR9… PPR    <NA>  10.1… "Mal… Kwon H, Rey… <NA>         2019     <NA>      
#>  5 PPR9… PPR    <NA>  10.1… "Dis… Subudhi AK,… <NA>         2019     <NA>      
#>  6 PPR1… PPR    <NA>  10.2… "A r… Jivapetthai… <NA>         2019     <NA>      
#>  7 PPR6… PPR    <NA>  10.1… "Gen… McLean KJ, … <NA>         2018     <NA>      
#>  8 PPR8… PPR    <NA>  10.1… "Qua… Hopp CS, Ka… <NA>         2019     <NA>      
#>  9 PPR5… PPR    <NA>  10.1… "A m… Tang Y, Mei… <NA>         2018     <NA>      
#> 10 3149… MED    3149… 10.1… "Par… Greischar M… Evolution    2019    "0014-3820…
#> # … with 18 more variables: pubType <chr>, isOpenAccess <chr>, inEPMC <chr>,
#> #   inPMC <chr>, hasPDF <chr>, hasBook <chr>, hasSuppl <chr>,
#> #   citedByCount <int>, hasReferences <chr>, hasTextMinedTerms <chr>,
#> #   hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> #   hasTMAccessionNumbers <chr>, firstIndexDate <chr>,
#> #   firstPublicationDate <chr>, issue <chr>, journalVolume <chr>,
#> #   pageInfo <chr>

Results are sorted by relevance. Other options via the sort parameter are

Search by DOIs

Sometimes, you would like to check, if articles are indexed in Europe PMC using DOI names, a widely used identifier for scholarly articles. Use epmc_search_by_doi() for this purpose.

my_dois <- c(
  "10.1159/000479962",
  "10.1002/sctm.17-0081",
  "10.1161/strokeaha.117.018077",
  "10.1007/s12017-017-8447-9"
  )
europepmc::epmc_search_by_doi(doi = my_dois)
#> # A tibble: 4 x 28
#>   id    source pmid  doi   title authorString journalTitle issue journalVolume
#>   <chr> <chr>  <chr> <chr> <chr> <chr>        <chr>        <chr> <chr>        
#> 1 2895… MED    2895… 10.1… Clin… Schnieder M… Eur Neurol   5-6   78           
#> 2 2894… MED    2894… 10.1… Conc… Doeppner TR… Stem Cells … 11    6            
#> 3 2901… MED    2901… 10.1… One-… Psychogios … Stroke       11    48           
#> 4 2862… MED    2862… 10.1… Defe… Carboni E, … Neuromolecu… 2-3   19           
#> # … with 19 more variables: pubYear <chr>, journalIssn <chr>, pageInfo <chr>,
#> #   pubType <chr>, isOpenAccess <chr>, inEPMC <chr>, inPMC <chr>, hasPDF <chr>,
#> #   hasBook <chr>, hasSuppl <chr>, citedByCount <int>, hasReferences <chr>,
#> #   hasTextMinedTerms <chr>, hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> #   hasTMAccessionNumbers <chr>, firstIndexDate <chr>,
#> #   firstPublicationDate <chr>, pmcid <chr>

Output options

By default, a non-nested data frame printed as tibble is returned. Other formats are output = "id_list" returning a list of IDs and sources, and output = “‘raw’”" for getting full metadata as list. Please be aware that these lists can become very large.

More advanced options to search Europe PMC

Annotations

Europe PMC provides text-mined annotations contained in abstracts and open access full-text articles.

These automatically identified concepts and term can be retrieved at the article-level:

europepmc::epmc_annotations_by_id(c("MED:28585529", "PMC:PMC1664601"))
#> # A tibble: 774 x 13
#>    source ext_id pmcid prefix exact postfix name  uri   id    type  section
#>    <chr>  <chr>  <chr> <chr>  <chr> <chr>   <chr> <chr> <chr> <chr> <chr>  
#>  1 MED    28585… PMC5… "tive… Beta… " allo… Beta… http… http… Orga… Title …
#>  2 MED    28585… PMC5… "at, … suga… " (Bet… suga… http… http… Orga… Abstra…
#>  3 MED    28585… PMC5… "d a … beet  ". "    beet  http… http… Orga… Abstra…
#>  4 MED    28585… PMC5… "lati… beets " (B. … beets http… http… Orga… Abstra…
#>  5 MED    28585… PMC5… "of <… B. v… " ssp.… B. v… http… http… Orga… Abstra…
#>  6 MED    28585… PMC5… " bee… ssp   ". mar… ssp   http… http… Gene… Abstra…
#>  7 MED    28585… PMC5… "ify … Beta… " ssp.… Beta… http… http… Orga… Abstra…
#>  8 MED    28585… PMC5… "beet… ssp   ". vul… ssp   http… http… Gene… Abstra…
#>  9 MED    28585… PMC5… "ed v… MBS   "). "   MBS   http… http… Gene… Abstra…
#> 10 MED    28585… PMC5… "2 wa… MBS   " and … MBS   http… http… Gene… Abstra…
#> # … with 764 more rows, and 2 more variables: provider <chr>, subType <chr>

To obtain a list of articles where Europe PMC has text-minded annotations, either subset the resulting data.frame

tt <- epmc_search("malaria")
tt[tt$hasTextMinedTerms == "Y" | tt$hasTMAccessionNumbers == "Y",]
#> # A tibble: 97 x 29
#>    id    source pmid  pmcid doi   title authorString journalTitle issue
#>    <chr> <chr>  <chr> <chr> <chr> <chr> <chr>        <chr>        <chr>
#>  1 3204… MED    3204… PMC7… 10.1… Bala… Drewry LL, … Virulence    1    
#>  2 3206… MED    3206… PMC7… 10.1… Pred… Patel H, Du… Virulence    1    
#>  3 3204… MED    3204… <NA>  10.1… Mode… Olaniyi S, … J Biol Dyn   1    
#>  4 3246… MED    3246… <NA>  10.1… Back… Xing Y, Guo… J Biol Dyn   1    
#>  5 3190… MED    3190… PMC6… 10.1… Sett… Bucşan AN, … Virulence    1    
#>  6 3236… MED    3236… PMC7… 10.1… Mate… Charlier C,… Virulence    1    
#>  7 3185… MED    3185… PMC6… 10.1… Inhi… Alissa SA, … J Enzyme In… 1    
#>  8 3207… MED    3207… <NA>  10.2… 2-Am… Serban G.    Acta Pharm   3    
#>  9 3220… MED    3220… PMC7… 10.1… Esta… Acharya KP,… Emerg Micro… 1    
#> 10 3186… MED    3186… PMC6… 10.1… Iden… Zhou Y, Wen… Pharm Biol   1    
#> # … with 87 more rows, and 20 more variables: journalVolume <chr>,
#> #   pubYear <chr>, journalIssn <chr>, pageInfo <chr>, pubType <chr>,
#> #   isOpenAccess <chr>, inEPMC <chr>, inPMC <chr>, hasPDF <chr>, hasBook <chr>,
#> #   hasSuppl <chr>, citedByCount <int>, hasReferences <chr>,
#> #   hasTextMinedTerms <chr>, hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> #   hasTMAccessionNumbers <chr>, firstIndexDate <chr>,
#> #   firstPublicationDate <chr>, versionNumber <int>

or expand the query choosing an annotation type or provider from the Europe PMC Advanced Search query builder.

epmc_search('malaria AND (ANNOTATION_TYPE:"Cell") AND (ANNOTATION_PROVIDER:"Europe PMC")')
#> # A tibble: 100 x 28
#>    id    source pmid  pmcid doi   title authorString journalTitle issue
#>    <chr> <chr>  <chr> <chr> <chr> <chr> <chr>        <chr>        <chr>
#>  1 3130… MED    3130… PMC7… 10.1… Blac… Opoka RO, W… Clin Infect… 11   
#>  2 3169… MED    3169… PMC7… 10.1… Redu… Kingston HW… J Infect Dis 9    
#>  3 3150… MED    3150… <NA>  10.1… Acut… Oshomah-Bel… J Trop Pedi… 2    
#>  4 3182… MED    3182… <NA>  10.1… CD8+… Riggle BA, … J Clin Inve… 3    
#>  5 3167… MED    3167… <NA>  10.1… A Sy… Thiengsusuk… Eur J Drug … 2    
#>  6 3104… MED    3104… <NA>  10.1… Elev… Datta D, Co… Clin Infect… 6    
#>  7 3168… MED    3168… <NA>  10.1… Eval… Ferdinand D… Trans R Soc… 3    
#>  8 3085… MED    3085… <NA>  10.1… An E… Woodford J,… J Infect Dis 6    
#>  9 3153… MED    3153… <NA>  10.1… Asso… Peitzmeier … AIDS Behav   3    
#> 10 3184… MED    3184… PMC6… 10.1… Arte… Pull L, Lup… Malar J      1    
#> # … with 90 more rows, and 19 more variables: journalVolume <chr>,
#> #   pubYear <chr>, journalIssn <chr>, pageInfo <chr>, pubType <chr>,
#> #   isOpenAccess <chr>, inEPMC <chr>, inPMC <chr>, hasPDF <chr>, hasBook <chr>,
#> #   hasSuppl <chr>, citedByCount <int>, hasReferences <chr>,
#> #   hasTextMinedTerms <chr>, hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> #   hasTMAccessionNumbers <chr>, firstIndexDate <chr>,
#> #   firstPublicationDate <chr>

Data integrations

Another nice feature of Europe PMC is to search for cross-references between Europe PMC to other databases. For instance, to get publications cited by entries in the Protein Data bank in Europe published 2016:

europepmc::epmc_search('(HAS_PDB:y) AND FIRST_PDATE:2016')
#> # A tibble: 100 x 28
#>    id    source pmid  pmcid doi   title authorString journalTitle issue
#>    <chr> <chr>  <chr> <chr> <chr> <chr> <chr>        <chr>        <chr>
#>  1 2803… MED    2803… PMC5… 10.1… Stru… Su HP, Rick… Proc Natl A… 3    
#>  2 2803… MED    2803… PMC5… 10.1… Stru… Kovaľ T, Øs… PLoS One     12   
#>  3 2797… MED    2797… <NA>  10.1… Comp… De Deurwaer… ACS Chem Ne… 5    
#>  4 2814… MED    2814… PMC5… 10.3… Bioc… Ulrich V, B… Beilstein J… <NA> 
#>  5 2802… MED    2802… <NA>  10.1… Stru… Zhou Z, Liu… Appl Microb… 7    
#>  6 2795… MED    2795… <NA>  10.1… Glyc… Hamark C, B… J Am Chem S… 1    
#>  7 2795… MED    2795… PMC6… 10.1… Stru… Reed AJ, Vy… J Am Chem S… 1    
#>  8 2803… MED    2803… PMC5… 10.1… Stru… Sevrioukova… Proc Natl A… 3    
#>  9 2808… MED    2808… PMC5… 10.3… Conf… Paoletti F,… Front Mol B… <NA> 
#> 10 2802… MED    2802… <NA>  10.1… Solu… Bibow S, Po… Nat Struct … 2    
#> # … with 90 more rows, and 19 more variables: journalVolume <chr>,
#> #   pubYear <chr>, journalIssn <chr>, pageInfo <chr>, pubType <chr>,
#> #   isOpenAccess <chr>, inEPMC <chr>, inPMC <chr>, hasPDF <chr>, hasBook <chr>,
#> #   hasSuppl <chr>, citedByCount <int>, hasReferences <chr>,
#> #   hasTextMinedTerms <chr>, hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> #   hasTMAccessionNumbers <chr>, firstIndexDate <chr>,
#> #   firstPublicationDate <chr>

The following sources are supported

To retrieve metadata about these external database links, use europepmc_epmc_db().

Citations and reference sections

Europe PMC let us also obtain citation metadata and reference sections. For retrieving citation metadata per article, use

europepmc::epmc_citations("9338777", limit = 500)
#> # A tibble: 232 x 11
#>    id    source citationType title authorString journalAbbrevia… pubYear volume
#>    <chr> <chr>  <chr>        <chr> <chr>        <chr>              <int> <chr> 
#>  1 3156… MED    research-ar… Regu… Chung HC, N… J Vet Sci           2019 20    
#>  2 3023… MED    research su… Bioe… Legallais C… Adv Healthc Mat…    2018 7     
#>  3 3026… MED    research su… Porc… Fiebig U, F… Xenotransplanta…    2018 25    
#>  4 2975… MED    historical … Infe… Weiss RA.    Xenotransplanta…    2018 25    
#>  5 2964… MED    research su… Trac… Kawasaki J,… Viruses             2018 10    
#>  6 2876… MED    research su… Pres… Kawasaki J,… J Virol             2017 91    
#>  7 2843… MED    research su… Thre… Colon-Moran… Virology            2017 507   
#>  8 2805… MED    research su… Anti… Inoue Y, Yo… Ann Biomed Eng      2017 45    
#>  9 2783… MED    research-ar… Tran… Kim N, Choi… PLoS One            2016 11    
#> 10 2746… MED    research su… Exis… Kuse K, Ito… J Virol             2016 90    
#> # … with 222 more rows, and 3 more variables: issue <chr>, pageInfo <chr>,
#> #   citedByCount <int>

For reference section from an article:

europepmc::epmc_refs("28632490", limit = 200)
#> # A tibble: 169 x 19
#>    id    source citationType title authorString journalAbbrevia… issue pubYear
#>    <chr> <chr>  <chr>        <chr> <chr>        <chr>            <chr>   <int>
#>  1 1200… MED    JOURNAL ART… Tric… Adolfsson-E… Chemosphere      9-10     2002
#>  2 1879… MED    JOURNAL ART… In v… Ahn KC, Zha… Environ. Health… 9        2008
#>  3 1855… MED    JOURNAL ART… Effe… Aiello AE, … Am J Public Hea… 8        2008
#>  4 1768… MED    JOURNAL ART… Cons… Aiello AE, … Clin. Infect. D… <NA>     2007
#>  5 1527… MED    JOURNAL ART… Rela… Aiello AE, … Antimicrob. Age… 8        2004
#>  6 1820… MED    JOURNAL ART… The … Allmyr M, H… Sci. Total Envi… 1        2008
#>  7 1700… MED    JOURNAL ART… Tric… Allmyr M, A… Sci. Total Envi… 1        2006
#>  8 2694… MED    JOURNAL ART… Pres… Alvarez-Riv… J Chromatogr A   <NA>     2016
#>  9 2319… MED    JOURNAL ART… Expo… Anderson SE… Toxicol. Sci.    1        2012
#> 10 2583… MED    JOURNAL ART… Obse… Vladar EK, … Methods Cell Bi… <NA>     2015
#> # … with 159 more rows, and 11 more variables: volume <chr>, pageInfo <chr>,
#> #   citedOrder <int>, match <chr>, issn <chr>, essn <chr>,
#> #   publicationTitle <chr>, publisherLoc <chr>, publisherName <chr>,
#> #   externalLink <chr>, doi <chr>

Fulltext access

Europe PMC gives not only access to metadata, but also to full-texts. Adding AND (OPEN_ACCESS:y) to your search query, returns only those articles where Europe PMC has also the fulltext.

Fulltext as xml document can accessed via the PMID or the PubMed Central ID (PMCID):

europepmc::epmc_ftxt("PMC3257301")
#> {xml_document}
#> <article article-type="research-article" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML">
#> [1] <front>\n  <journal-meta>\n    <journal-id journal-id-type="nlm-ta">PLoS  ...
#> [2] <body>\n  <sec id="s1">\n    <title>Introduction</title>\n    <p>Atmosphe ...
#> [3] <back>\n  <ack>\n    <p>We would like to thank Dr. C. Gourlay and Dr. T.  ...