read_xlsb
attempts to import a region from a binary format Excel workbook (xlsb)
The parameters are
Either a name, or the index of the sheet to read. Index of the first sheet is 1. If the sheet name is embedded in the range argument, or implied if range is a named range, then this argument is ignored
range can be specified as
Can be implied from the spreadsheet or specified in advanced. When specifying types, options are
When implying types from the underlying spreadsheet data, the resultant type is the regarded as the ‘least fragile’.
Effectively the order is logical – datetime – integer – double – string
Currently ‘date’ is implied from cell formatting. It is either one of the built-in Excel datetime formats or a custom format where the format string contains only the characters Y, M, D, H, S, y, m, d, h, s and - (dash), : (colon), (space), . (dot). That should be good enough to identify any dates.
If ‘date’ is specified as the column type, then any strings are converted to dates. The format I’m afraid isn’t flexible at this stage - it’s assumed to be of the type “%Y-%m-%dT%H:%M:%S”.
I think there’s a 1900 leap year bug in Excel. I haven’t corrected for that.
If any of the cells in a ‘date’ column contain time, then a POSIXct object is returned, otherwise a Date object is returned. The timezone for POSIXct is set as UTC. Seems like the sensible thing to do.
A character string that is interpret as NA. This does not effect the implied data type for a column.
Should leading and trailing whitespaces be trimmed from character strings?
The number of rows to skip before reading data.
Additional options. At present just debug = TRUE is supported. This returns a list with fields ‘result’ set to the resulting data.frame and ‘env’ set to an internal environment that may be useful for debugging.
res = read_xlsb(path = system.file("extdata", "TestBook.xlsb", package = "readxlsb"), range = "PORTFOLIO", debug = TRUE)
ls(res$env)
#> [1] "content" "named_ranges" "sheets" "stream"
res$env$named_ranges
#> name range sheet_idx first_column first_row
#> 1 INFO_RELEASE FirstSheet!$A$11 0 1 11
#> 2 OUTLOOK 'My SecondTab'!$A$1:$C$13 1 1 1
#> 3 PORTFOLIO FirstSheet!$A$3:$C$9 0 1 3
#> 4 SAVED_DATETIME FirstSheet!$C$13 0 3 13
#> 5 TITLE FirstSheet!$A$1 0 1 1
#> last_column last_row
#> 1 1 11
#> 2 3 13
#> 3 3 9
#> 4 3 13
#> 5 1 1