The getLattes
R
package, written by Roney Fraga Souza and Winicius Sabino, was built to extract data from the Lattes curriculum platform exported as XML
.
The XML
file needs to be extracted from .zip
.
To automate the download process, please see Captchas Negated by Python reQuests - CNPQ.
Stable version from CRAN.
Development version from GitHub.
# install and load devtools from CRAN
install.packages("devtools")
library(devtools)
# install and load getLattes
devtools::install_github("roneyfraga/getLattes")
library(getLattes)
# the file 4984859173592703.xml is stored in datatest directory
# cl <- readLattes(filexml='4984859173592703.xml', path='datatest/')
# import all Lattes XML files in datateste
# cls <- readLattes(filexml='*.xml$', path='datatest/')
# import all Lattes XML files in the working directory
cls <- readLattes(filexml='*.xml$')
To load 2 Lattes curricula, from important researchers in my academic journey, imported as R list.
# to combine list of data frames in data frame
library(dplyr)
# to import from one curriculum
getDadosGerais(xmlsLattes[[2]])
# to import from two or more curricula
lt <- lapply(xmlsLattes, getDadosGerais)
head(bind_rows(lt))
# to import from one curriculum
getArtigosPublicados(xmlsLattes[[2]])
# to import from two or more curricula
lt <- lapply(xmlsLattes, getArtigosPublicados)
head(bind_rows(lt))
See normalizeByDoi
, normalizeByJournal
and normalizeByYear
to normalize publications data (journal title, ISSN and year).