Sometimes you need to publish a figure in a vector format:
library(ggplot2)
library(ggrastr)
points_num <- 50
df <- data.frame(x=rnorm(points_num), y=rnorm(points_num), c=as.factor(1:points_num %% 2))
gg <- ggplot(df, aes(x=x, y=y, color=c)) + scale_color_discrete(guide=F)
(gg_vec <- gg + geom_point(size=0.5))
But in other cases, your figure contains thousands of points, e.g. try points_num <- 500000
in the example above, and you will notice the performance issues.
In this case, the reasonable solution would be to rasterize the plot. But the problem is that all text becomes raster as well.
Raster layers were developed to prevent such a situation:
(gg_rast <- gg + geom_point_rast(size=0.5))
The plots look the same, but the difference can be seen when they are exported to pdfs. Unfortunately, the price is longer rendering time.
PrintFileSize <- function(gg, name) {
invisible(ggsave('tmp.pdf', gg, width = 4, height = 4))
cat(name, ': ', file.info('tmp.pdf')$size / 1024, ' Kb.\n', sep = '')
unlink('tmp.pdf')
}
PrintFileSize(gg_rast, 'Raster')
#> Raster: 24.20898 Kb.
PrintFileSize(gg_vec, 'Vector')
#> Vector: 7.392578 Kb.
As expected, the difference becomes larger with growth of number of points:
points_num <- 1000000
df <- data.frame(x=rnorm(points_num), y=rnorm(points_num), c=as.factor(1:points_num %% 2))
gg <- ggplot(df, aes(x=x, y=y, color=c)) + scale_color_discrete(guide=F)
gg_vec <- gg + geom_point(size=0.5)
gg_rast <- gg + geom_point_rast(size=0.5)
PrintFileSize(gg_rast, 'Raster')
#> Raster: 483.4795 Kb.
PrintFileSize(gg_vec, 'Vector')
#> Vector: 54774.81 Kb.
Heatmaps also don't work well with vector formats:
coords <- expand.grid(1:100, 1:100)
coords$Value <- 1 / apply(as.matrix(coords), 1, function(x) sum((x - c(50, 50))^2)^0.01)
ggplot(coords) + geom_tile(aes(x=Var1, y=Var2, fill=Value))
ggplot(coords) + geom_tile_rast(aes(x=Var1, y=Var2, fill=Value))
Another type of plots with a potentially large number of small objects is geom_boxplot:
points_num <- 100
df <- data.frame(x=as.factor(1:points_num %% 2), y=log(abs(rcauchy(points_num))))
gg <- ggplot(df, aes(x=x, y=y)) + scale_color_discrete(guide=F)
gg + geom_boxplot()
Try the above example with points_num <- 100000
or a larger integer—it will take quite long to render, far longer than allowed via CRAN.
With large number of objects, outlier points become noninformative. It's better to jitter them:
gg_vec <- gg + geom_boxplot_jitter(outlier.size=0.1, outlier.jitter.width = 0.3, outlier.alpha=0.5)
gg_vec
And this geom can be rasterized as well:
gg_rast <- gg + geom_boxplot_jitter(outlier.size=0.1, outlier.jitter.width = 0.3, outlier.alpha=0.5, raster=T, raster.dpi = 200)
gg_rast
PrintFileSize(gg_rast, 'Raster')
#> Raster: 10.99609 Kb.
PrintFileSize(gg_vec, 'Vector')
#> Vector: 4.972656 Kb.
In the current version, legends can disturb raster plots:
points_num <- 10
df <- data.frame(x=rnorm(points_num), y=rnorm(points_num), c=as.factor(1:points_num %% 2))
ggplot(df, aes(x=x, y=y, color=c)) + geom_point_rast(size=0.5)
To restore the side ratio, teh parameters width and height can be used:
points_num <- 10
df <- data.frame(x=rnorm(points_num), y=rnorm(points_num), c=as.factor(1:points_num %% 2))
ggplot(df, aes(x=x, y=y, color=c)) + geom_point_rast(size=0.5, raster.width = 1)