Rafael H. M. Pereira, Pedro R. Andrade, Joao Bazzo
27 March 2020
Package gtfs2gps
allows users to convert public transport GTFS data into a single data.table
format with GPS-like records, which can then be used in various applications such as running transport simulations or scenario analyses. Before using the package, just install it from GitHub.
After loading the package, GTFS data can be read into R by using read_gtfs()
. This function gets a zipped GTFS file and returns a list of data.table
objects. The returning list contains the data of each GTFS file indexed according to their file names without extension.
library("data.table")
library("gtfs2gps")
sao <- read_gtfs(system.file("extdata/saopaulo.zip", package ="gtfs2gps"))
names(sao)
## [1] "agency" "routes" "stops" "stop_times" "shapes"
## [6] "trips" "calendar" "frequencies"
## route_id service_id trip_id trip_headsign direction_id shape_id
## 1: 121G-10 USD 121G-10-0 Metrô Tucuruvi 0 52421
## 2: 148L-10 USD 148L-10-0 Lapa 0 52857
## 3: 148L-10 USD 148L-10-1 Cohab Antártica 1 52858
## 4: 1720-10 USD 1720-10-0 Cantareira 0 54502
## 5: 1720-10 USD 1720-10-1 Jd. Guancã 1 54503
## ---
## 229: N732-11 USD N732-11-0 Term. Jd. Jacira 0 51990
## 230: N739-11 USD N739-11-0 Jd. Universal 0 51954
## 231: N740-11 USD N740-11-0 Jd. Riviera 0 51939
## 232: N838-11 USD N838-11-0 Cptm Leopoldina 0 52072
## 233: N840-11 USD N840-11-0 Sta. Cecília 0 52135
Note that not all GTFS files are loaded into R. This function only loads the necessary data to spatially and temporally handle trips and stops, which are: - agency.txt - calendar.txt - routes.txt - shapes.txt - stop_times.txt - stops.txt - trips.txt - frequencies.txt (this last one is optional).
If a given GTFS zipped file does not contain all of these required files then read_gtfs()
will stop with an error.
GTFS data sets can be fairly large for complex public transport networks and, in some cases, users might want to focus on specific transport services at week days/weekends, or on specific trips or routes. The package brings some functions to filter GTFS.zip and speed up the data processing.
These functions subset all the relevant GTFS files in order to remove all the unnecessary rows, keeping the data consistent. The returning values of the four functions is a list of data.table
objects, in the same way of the input data. For example, in the code below we filter only shape ids between 53000 and 53020.
## [1] "6227.2 Kb"
sao_small <- gtfs2gps::filter_by_shape_id(sao, c(51338, 51956, 51657))
object.size(sao_small) %>% format(units = "Kb")
## [1] "105.8 Kb"
We can then easily convert the data to simple feature format and plot them.
sao_small_shapes_sf <- gtfs2gps::gtfs_shapes_as_sf(sao_small)
sao_small_stops_sf <- gtfs2gps::gtfs_stops_as_sf(sao_small)
plot(sf::st_geometry(sao_small_shapes_sf))
plot(sf::st_geometry(sao_small_stops_sf), pch = 20, col = "red", add = TRUE)
box()
After subsetting the data, it is also possible to save it as a new GTFS file using write_gtfs()
, as shown below.
To convert GTFS to GPS-like format, use gtfs2gps()
. This is the core function of the package. It takes a GTFS zipped file as an input and returns a data.table
where each row represents a ‘GPS-like’ data point for every trip in the GTFS file. In summary, this function interpolates the space-time position of each vehicle in each trip considering the network distance and average speed between stops. The function samples the timestamp of each vehicle every (15m) by default, but the user can set a different value in the spatial_resolution
argument. See the example below.
sao_gps <- gtfs2gps("sao_small.zip", progress = FALSE, parallel = FALSE, spatial_resolution = 50)
head(sao_gps)
## trip_id route_type id shape_pt_lon shape_pt_lat departure_time stop_id
## 1: 5010-10-0 3 1 -46.63120 -23.66268 04:00:01 3703053
## 2: 5010-10-0 3 2 -46.63117 -23.66273 04:00:03 <NA>
## 3: 5010-10-0 3 3 -46.63108 -23.66288 04:00:08 <NA>
## 4: 5010-10-0 3 4 -46.63095 -23.66316 04:00:13 <NA>
## 5: 5010-10-0 3 5 -46.63082 -23.66345 04:00:18 <NA>
## 6: 5010-10-0 3 6 -46.63111 -23.66364 04:00:23 <NA>
## stop_sequence dist cumdist speed cumtime shape_id
## 1: 1 7.230445 7.230445 26.5931 0.9788103 51338
## 2: NA 18.369274 25.599720 26.5931 3.4655221 51338
## 3: NA 34.505965 60.105685 26.5931 8.1367134 51338
## 4: NA 34.505965 94.611650 26.5931 12.8079046 51338
## 5: NA 36.478776 131.090426 26.5931 17.7461620 51338
## 6: NA 36.478776 167.569201 26.5931 22.6844194 51338
The following figure maps the first 100 data points of the sample data we processed. They can be converted to simple feature
points or linestring.
sao_gps60 <- sao_gps[1:100, ]
# points
sao_gps60_sfpoints <- gps_as_sfpoints(sao_gps60)
# linestring
sao_gps60_sflinestring <- gps_as_sflinestring(sao_gps60)
# plot
plot(sf::st_geometry(sao_gps60_sfpoints), pch = 20)
plot(sf::st_geometry(sao_gps60_sflinestring), col = "blue", add = TRUE)
box()
The function gtfs2gps()
automatically recognizes whether the GTFS data brings detailed stop_times.txt
information or whether it is a frequency.txt
GTFS file. A sample data of a GTFS with detailed stop_times.txt
cab be found below:
poa <- system.file("extdata/poa.zip", package ="gtfs2gps")
poa_gps <- gtfs2gps(poa, progress = FALSE, parallel = FALSE, spatial_resolution = 50)
poa_gps_sflinestrig <- gps_as_sfpoints(poa_gps)
plot(sf::st_geometry(poa_gps_sflinestrig[1:200,]))
box()
For a given trip, the function gtfs2gps
calculates the average speed between each pair of consecutive stops — given by the ratio between cumulative network distance S
and departure time t
for a consecutive pair of valid stop_ids (i
),
[Large Speed_i = \frac{S_{i+1}-S_i}{t_{i+1}-t_i}]
Since the beginning of each trip usually starts before the first stop_id, the mean speed cannot be calculated as shown in the previous equation because information on i
period does not exist. In this case, the function consider the mean speed for the whole trip. It also happens after the last valid stop_id (N
) of the trips, where info on i+1
also does not exist.
If you have any suggestions or want to report an error, please visit the GitHub page of the package here.