Maintaining Variable Classes
R has different options to store dataframes as plain text files from R. Base R has write.table()
and its companions like write.csv()
. Some other options are data.table::fwrite()
, readr::write_delim()
, readr::write_csv()
and readr::write_tsv()
. Each of them writes a dataframe as a plain text file by converting all variables into characters. After reading the file, they revert this conversion. The distinction between character
and factor
gets lost in translation. read.table()
converts by default all strings to factors, readr::read_csv()
keeps by default all strings as character. These functions cannot recover the factor levels. These functions determine factor levels based on the observed levels in the plain text file. Hence factor levels without observations will disappear. The order of the factor levels is also determined by the available levels in the plain text file, which can be different from the original order.
The write_vc()
and read_vc()
functions from git2rdata
keep track of the class of each variable and, in case of a factor, also of the factor levels and their order. Hence this function pair preserves the information content of the dataframe. The vc
suffix stands for version control as these functions use their full capacity in combination with a version control system.