getLattes

The getLattes R package, written by Roney Fraga Souza and Winicius Sabino, was built to extract data from the Lattes curriculum platform exported as XML.

The XML file needs to be extracted from .zip.

To automate the download process, please see Captchas Negated by Python reQuests - CNPQ.

Installation

Stable version from CRAN.

install.packages('getLattes')
library(getLattes)

Development version from GitHub.

# install and load devtools from CRAN
install.packages("devtools")
library(devtools)

# install and load getLattes
devtools::install_github("roneyfraga/getLattes")
library(getLattes)

Import XML file as R list

# the file 4984859173592703.xml is stored in datatest directory
# cl <- readLattes(filexml='4984859173592703.xml', path='datatest/')

# import all Lattes XML files in datateste
# cls <- readLattes(filexml='*.xml$', path='datatest/')

# import all Lattes XML files in the working directory
cls <- readLattes(filexml='*.xml$')

Loaded data

To load 2 Lattes curricula, from important researchers in my academic journey, imported as R list.

data(xmlsLattes)
length(xmlsLattes)

Import general data

# to combine list of data frames in data frame
library(dplyr)

# to import from one curriculum 
getDadosGerais(xmlsLattes[[2]])

# to import from two or more curricula
lt <- lapply(xmlsLattes, getDadosGerais)
head(bind_rows(lt))

Import Published Academic Papers

# to import from one curriculum 
getArtigosPublicados(xmlsLattes[[2]]) 

# to import from two or more curricula
lt <- lapply(xmlsLattes, getArtigosPublicados)
head(bind_rows(lt))

Normalize informations

See normalizeByDoi, normalizeByJournal and normalizeByYear to normalize publications data (journal title, ISSN and year).