Full text

Scott Chamberlain

2020-04-07

Search functions in rplos can be used to get back full text in addition to any section of an article. However, if you prefer XML, this vignette is for you.

Load package from CRAN

install.packages("rplos")
library('rplos')

Get full text URLs

Create urls for full text articles in PLOS journals

Here’s the URL for XML full text for the DOI 10.1371/journal.pone.0086169

full_text_urls(doi = '10.1371/journal.pone.0086169')
#> [1] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0086169&type=manuscript"

And for the DOI 10.1371/journal.pbio.1001845

full_text_urls(doi = '10.1371/journal.pbio.1001845')
#> [1] "http://journals.plos.org/plosbiology/article/file?id=10.1371/journal.pbio.1001845&type=manuscript"

The function is vectorized, so you can pass in many DOIs

full_text_urls(doi = c('10.1371/journal.pone.0086169', 
                       '10.1371/journal.pbio.1001845'))
#> [1] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0086169&type=manuscript"    
#> [2] "http://journals.plos.org/plosbiology/article/file?id=10.1371/journal.pbio.1001845&type=manuscript"

Use searchplos() to get a lot of DOIs, then get the URLs for full text XML

dois <- searchplos(q = "*:*", fq = 'doc_type:full', limit = 20)$data$id
full_text_urls(dois)
#>  [1] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0020843&type=manuscript"
#>  [2] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0022257&type=manuscript"
#>  [3] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0023139&type=manuscript"
#>  [4] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0023138&type=manuscript"
#>  [5] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0023119&type=manuscript"
#>  [6] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0023113&type=manuscript"
#>  [7] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0023133&type=manuscript"
#>  [8] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0023106&type=manuscript"
#>  [9] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0023117&type=manuscript"
#> [10] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0023135&type=manuscript"
#> [11] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0023134&type=manuscript"
#> [12] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0023105&type=manuscript"
#> [13] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0020929&type=manuscript"
#> [14] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0020900&type=manuscript"
#> [15] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0020913&type=manuscript"
#> [16] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0020935&type=manuscript"
#> [17] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0020914&type=manuscript"
#> [18] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0020919&type=manuscript"
#> [19] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0020899&type=manuscript"
#> [20] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0020892&type=manuscript"

Get XML

Get full text XML of PLOS papers given a DOI

plos_fulltext(doi = '10.1371/journal.pone.0086169')
#> 1 full-text articles retrieved 
#> Min. Length: 110717 - Max. Length: 110717 
#> DOIs: 10.1371/journal.pone.0086169 ... 
#> 
#> NOTE: extract xml strings like output['<doi>']

plos_fulltext() is vectorized, so you can pass in more than one DOI

plos_fulltext(c('10.1371/journal.pone.0086169','10.1371/journal.pbio.1001845'))
#> 2 full-text articles retrieved 
#> Min. Length: 110717 - Max. Length: 143442 
#> DOIs: 10.1371/journal.pone.0086169 10.1371/journal.pbio.1001845 ... 
#> 
#> NOTE: extract xml strings like output['<doi>']

Get many DOIs, then index to get the full XML of the one you want (output not shown)

dois <- searchplos(q = "*:*", fq = 'doc_type:full', limit = 3)$data$id
out <- plos_fulltext(dois)
xml <- out[dois[1]][[1]]

Extract the abstract from the XML

if (requireNamespace("xml2")) {
  library("xml2")
  xml_text(xml_find_all(read_xml(xml), "//abstract"))
}
#> [1] "BackgroundWolbachia are intriguing symbiotic endobacteria with a peculiar host range that includes arthropods and a single nematode family, the Onchocercidae encompassing agents of filariases. This raises the question of the origin of infection in filariae. Wolbachia infect the female germline and the hypodermis. Some evidences lead to the theory that Wolbachia act as mutualist and coevolved with filariae from one infection event: their removal sterilizes female filariae; all the specimens of a positive species are infected; Wolbachia are vertically inherited; a few species lost the symbiont. However, most data on Wolbachia and filaria relationships derive from studies on few species of Onchocercinae and Dirofilariinae, from mammals.Methodology/Principal FindingsWe investigated the Wolbachia distribution testing 35 filarial species, including 28 species and 7 genera and/or subgenera newly screened, using PCR, immunohistochemical staining, whole mount fluorescent analysis, and cocladogenesis analysis. (i) Among the newly screened Onchocercinae from mammals eight species harbour Wolbachia but for some of them, bacteria are absent in the hypodermis, or in variable density. (ii) Wolbachia are not detected in the pathological model Monanema martini and in 8, upon 9, species of Cercopithifilaria. (iii) Supergroup F Wolbachia is identified in two newly screened Mansonella species and in Cercopithifilaria japonica. (iv) Type F Wolbachia infect the intestinal cells and somatic female genital tract. (v) Among Oswaldofilariinae, Waltonellinae and Splendidofilariinae, from saurian, anuran and bird respectively, Wolbachia are not detected.Conclusions/SignificanceThe absence of Wolbachia in 63% of onchocercids, notably in the ancestral Oswaldofilariinae estimated 140 mya old, the diverse tissues or specimens distribution, and a recent lateral transfer in supergroup F Wolbachia, modify the current view on the role and evolution of the endosymbiont and their hosts. Further genomic analyses on some of the newly sampled species are welcomed to decipher the open questions."

Extract reference lists, just give back first one from each for brevity sake

if (requireNamespace("xml2")) {
  library("xml2")
  lapply(out[2:3], function(x){
    xml_find_all(read_xml(x), "//ref-list/ref")[[1]]
  })
}
#> $`10.1371/journal.pone.0022257`
#> {xml_node}
#> <ref id="pone.0022257-Gao1">
#> [1] <label>1</label>
#> [2] <element-citation publication-type="journal" xlink:type="simple">\n  <per ...
#> 
#> $`10.1371/journal.pone.0023139`
#> {xml_node}
#> <ref id="pone.0023139-DeLanghe1">
#> [1] <label>1</label>
#> [2] <element-citation publication-type="journal" xlink:type="simple">\n  <per ...