User Guide: 1 Debugging ggplots

‘gginnards’ 0.0.3

Pedro J. Aphalo

2019-11-26

Preliminaries

library(gginnards)
## Loading required package: ggplot2
library(tibble)

We generate some artificial data.

set.seed(4321)
# generate artificial data
x <- 1:100
y <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4)
my.data <- data.frame(x, 
                      y, 
                      group = c("A", "B"), 
                      y2 = y * c(0.5, 2),
                      block = c("a", "a", "b", "b"))

We change the default theme to an uncluttered one.

old_theme <- theme_set(theme_bw())

ggplot construction

Package ‘ggplot2’ defines its own class system, and function ggplot() can be considered as a constructor.

class(ggplot())
## [1] "gg"     "ggplot"

These objects contain all the information needed to render a plot into graphical output, but not the rendered plot itself. They are list-like objects with heterogeneous named members.

The structure of objects of class "ggplot" can be explored with R’s method str() as is the case for any structured R object. Package ‘gginnards’ defines a a specialization of str() for class "ggplot". Our str() allows us to see the different slots of these special type of lists. The difference with the default str() method is in the values of default arguments, and in the ability to control which components or members are displayed.

We will use the str() to follow the step by step construction of a "ggplot" object.

If we pass no arguments to the ggplot() constructor an empty plot will be rendered if we print it.

p0 <- ggplot()
p0

Object p contains members, but "data", "layers", "mapping", "theme" and "labels" are empty lists.

str(p0)
## Object size: 3.4 kB
## List of 9
##  $ data       : list()
##  $ layers     : list()
##  $ scales     :Classes 'ScalesList', 'ggproto', 'gg' <ggproto object: Class ScalesList, gg>
##     add: function
##     clone: function
##     find: function
##     get_scales: function
##     has_scale: function
##     input: function
##     n: function
##     non_position_scales: function
##     scales: NULL
##     super:  <ggproto object: Class ScalesList, gg> 
##  $ mapping    : Named list()
##  $ theme      : list()
##  $ coordinates:Classes 'CoordCartesian', 'Coord', 'ggproto', 'gg' <ggproto object: Class CoordCartesian, Coord, gg>
##     aspect: function
##     backtransform_range: function
##     clip: on
##     default: TRUE
##     distance: function
##     expand: TRUE
##     is_free: function
##     is_linear: function
##     labels: function
##     limits: list
##     modify_scales: function
##     range: function
##     render_axis_h: function
##     render_axis_v: function
##     render_bg: function
##     render_fg: function
##     setup_data: function
##     setup_layout: function
##     setup_panel_params: function
##     setup_params: function
##     transform: function
##     super:  <ggproto object: Class CoordCartesian, Coord, gg> 
##  $ facet      :Classes 'FacetNull', 'Facet', 'ggproto', 'gg' <ggproto object: Class FacetNull, Facet, gg>
##     compute_layout: function
##     draw_back: function
##     draw_front: function
##     draw_labels: function
##     draw_panels: function
##     finish_data: function
##     init_scales: function
##     map_data: function
##     params: list
##     setup_data: function
##     setup_params: function
##     shrink: TRUE
##     train_scales: function
##     vars: function
##     super:  <ggproto object: Class FacetNull, Facet, gg> 
##  $ plot_env   :<environment: R_GlobalEnv> 
##  $ labels     : Named list()

If we pass an argument to parameter data the data is copied into the list slot with name data. As we also map the data to aesthetics, this mapping is stored in slot maaping.

p1 <- ggplot(data = my.data, aes(x, y, colour = group))
str(p1)
## Object size: 11 kB
## List of 9
##  $ data       :'data.frame': 100 obs. of  5 variables:
##  $ layers     : list()
##  $ scales     :Classes 'ScalesList', 'ggproto', 'gg' <ggproto object: Class ScalesList, gg>
##     add: function
##     clone: function
##     find: function
##     get_scales: function
##     has_scale: function
##     input: function
##     n: function
##     non_position_scales: function
##     scales: NULL
##     super:  <ggproto object: Class ScalesList, gg> 
##  $ mapping    :List of 3
##  $ theme      : list()
##  $ coordinates:Classes 'CoordCartesian', 'Coord', 'ggproto', 'gg' <ggproto object: Class CoordCartesian, Coord, gg>
##     aspect: function
##     backtransform_range: function
##     clip: on
##     default: TRUE
##     distance: function
##     expand: TRUE
##     is_free: function
##     is_linear: function
##     labels: function
##     limits: list
##     modify_scales: function
##     range: function
##     render_axis_h: function
##     render_axis_v: function
##     render_bg: function
##     render_fg: function
##     setup_data: function
##     setup_layout: function
##     setup_panel_params: function
##     setup_params: function
##     transform: function
##     super:  <ggproto object: Class CoordCartesian, Coord, gg> 
##  $ facet      :Classes 'FacetNull', 'Facet', 'ggproto', 'gg' <ggproto object: Class FacetNull, Facet, gg>
##     compute_layout: function
##     draw_back: function
##     draw_front: function
##     draw_labels: function
##     draw_panels: function
##     finish_data: function
##     init_scales: function
##     map_data: function
##     params: list
##     setup_data: function
##     setup_params: function
##     shrink: TRUE
##     train_scales: function
##     vars: function
##     super:  <ggproto object: Class FacetNull, Facet, gg> 
##  $ plot_env   :<environment: R_GlobalEnv> 
##  $ labels     :List of 3
str(p1, max.level = 2, components = "data")
## Object size: 5.3 kB
## List of 1
##  $ data:'data.frame':    100 obs. of  5 variables:
##   ..$ x    : int [1:100] 1 2 3 4 5 ...
##   ..$ y    : num [1:100] -27205 -14243 ...
##   ..$ group: Factor w/ 2 levels "A","B": 1 2 1 2 1 ...
##   ..$ y2   : num [1:100] -13603 -28485 ...
##   ..$ block: Factor w/ 2 levels "a","b": 1 1 2 2 1 ...

A geometry adds a layer.

p2 <- p1 + geom_point()
str(p2)
## Object size: 11.5 kB
## List of 9
##  $ data       :'data.frame': 100 obs. of  5 variables:
##  $ layers     :List of 1
##  $ scales     :Classes 'ScalesList', 'ggproto', 'gg' <ggproto object: Class ScalesList, gg>
##     add: function
##     clone: function
##     find: function
##     get_scales: function
##     has_scale: function
##     input: function
##     n: function
##     non_position_scales: function
##     scales: list
##     super:  <ggproto object: Class ScalesList, gg> 
##  $ mapping    :List of 3
##  $ theme      : list()
##  $ coordinates:Classes 'CoordCartesian', 'Coord', 'ggproto', 'gg' <ggproto object: Class CoordCartesian, Coord, gg>
##     aspect: function
##     backtransform_range: function
##     clip: on
##     default: TRUE
##     distance: function
##     expand: TRUE
##     is_free: function
##     is_linear: function
##     labels: function
##     limits: list
##     modify_scales: function
##     range: function
##     render_axis_h: function
##     render_axis_v: function
##     render_bg: function
##     render_fg: function
##     setup_data: function
##     setup_layout: function
##     setup_panel_params: function
##     setup_params: function
##     transform: function
##     super:  <ggproto object: Class CoordCartesian, Coord, gg> 
##  $ facet      :Classes 'FacetNull', 'Facet', 'ggproto', 'gg' <ggproto object: Class FacetNull, Facet, gg>
##     compute_layout: function
##     draw_back: function
##     draw_front: function
##     draw_labels: function
##     draw_panels: function
##     finish_data: function
##     init_scales: function
##     map_data: function
##     params: list
##     setup_data: function
##     setup_params: function
##     shrink: TRUE
##     train_scales: function
##     vars: function
##     super:  <ggproto object: Class FacetNull, Facet, gg> 
##  $ plot_env   :<environment: R_GlobalEnv> 
##  $ labels     :List of 3

A summary() method that produces a more compact output is available in recent versions of ‘ggplot2’. However, it does not reveal the internal structure of the objects.

summary(p2)
## data: x, y, group, y2, block [100x5]
## mapping:  x = ~x, y = ~y, colour = ~group
## faceting: <ggproto object: Class FacetNull, Facet, gg>
##     compute_layout: function
##     draw_back: function
##     draw_front: function
##     draw_labels: function
##     draw_panels: function
##     finish_data: function
##     init_scales: function
##     map_data: function
##     params: list
##     setup_data: function
##     setup_params: function
##     shrink: TRUE
##     train_scales: function
##     vars: function
##     super:  <ggproto object: Class FacetNull, Facet, gg>
## -----------------------------------
## geom_point: na.rm = FALSE
## stat_identity: na.rm = FALSE
## position_identity
str(p2, max.level = 2, components = "mapping")
## Object size: 3 kB
## List of 1
##  $ mapping:List of 3
##   ..$ x     : language ~x
##   ..$ y     : language ~y
##   ..$ colour: language ~group
p3 <- p2 + theme_classic()
str(p3)
## Object size: 78 kB
## List of 9
##  $ data       :'data.frame': 100 obs. of  5 variables:
##  $ layers     :List of 1
##  $ scales     :Classes 'ScalesList', 'ggproto', 'gg' <ggproto object: Class ScalesList, gg>
##     add: function
##     clone: function
##     find: function
##     get_scales: function
##     has_scale: function
##     input: function
##     n: function
##     non_position_scales: function
##     scales: list
##     super:  <ggproto object: Class ScalesList, gg> 
##  $ mapping    :List of 3
##  $ theme      :List of 66
##  $ coordinates:Classes 'CoordCartesian', 'Coord', 'ggproto', 'gg' <ggproto object: Class CoordCartesian, Coord, gg>
##     aspect: function
##     backtransform_range: function
##     clip: on
##     default: TRUE
##     distance: function
##     expand: TRUE
##     is_free: function
##     is_linear: function
##     labels: function
##     limits: list
##     modify_scales: function
##     range: function
##     render_axis_h: function
##     render_axis_v: function
##     render_bg: function
##     render_fg: function
##     setup_data: function
##     setup_layout: function
##     setup_panel_params: function
##     setup_params: function
##     transform: function
##     super:  <ggproto object: Class CoordCartesian, Coord, gg> 
##  $ facet      :Classes 'FacetNull', 'Facet', 'ggproto', 'gg' <ggproto object: Class FacetNull, Facet, gg>
##     compute_layout: function
##     draw_back: function
##     draw_front: function
##     draw_labels: function
##     draw_panels: function
##     finish_data: function
##     init_scales: function
##     map_data: function
##     params: list
##     setup_data: function
##     setup_params: function
##     shrink: TRUE
##     train_scales: function
##     vars: function
##     super:  <ggproto object: Class FacetNull, Facet, gg> 
##  $ plot_env   :<environment: R_GlobalEnv> 
##  $ labels     :List of 3

Themes are stored as nested lists. To keep the output short we use max.level = 2 although using max.level = 3 would be needed to see all nested members.

str(p3, max.level = 2, components = "theme")
## Object size: 66.8 kB
## List of 1
##  $ theme:List of 66
##   ..$ line                      :List of 6
##   ..$ rect                      :List of 5
##   ..$ text                      :List of 11
##   ..$ axis.title.x              :List of 11
##   ..$ axis.title.x.top          :List of 11
##   ..$ axis.title.y              :List of 11
##   ..$ axis.title.y.right        :List of 11
##   ..$ axis.text                 :List of 11
##   ..$ axis.text.x               :List of 11
##   ..$ axis.text.x.top           :List of 11
##   ..$ axis.text.y               :List of 11
##   ..$ axis.text.y.right         :List of 11
##   ..$ axis.ticks                :List of 6
##   ..$ axis.ticks.length         : 'unit' num 2.75pt
##   ..$ axis.ticks.length.x       : NULL
##   ..$ axis.ticks.length.x.top   : NULL
##   ..$ axis.ticks.length.x.bottom: NULL
##   ..$ axis.ticks.length.y       : NULL
##   ..$ axis.ticks.length.y.left  : NULL
##   ..$ axis.ticks.length.y.right : NULL
##   ..$ axis.line                 :List of 6
##   ..$ axis.line.x               : NULL
##   ..$ axis.line.y               : NULL
##   ..$ legend.background         :List of 5
##   ..$ legend.margin             : 'margin' num [1:4] 5.5pt 5.5pt 5.5pt 5.5pt
##   ..$ legend.spacing            : 'unit' num 11pt
##   ..$ legend.spacing.x          : NULL
##   ..$ legend.spacing.y          : NULL
##   ..$ legend.key                : list()
##   ..$ legend.key.size           : 'unit' num 1.2lines
##   ..$ legend.key.height         : NULL
##   ..$ legend.key.width          : NULL
##   ..$ legend.text               :List of 11
##   ..$ legend.text.align         : NULL
##   ..$ legend.title              :List of 11
##   ..$ legend.title.align        : NULL
##   ..$ legend.position           : chr "right"
##   ..$ legend.direction          : NULL
##   ..$ legend.justification      : chr "center"
##   ..$ legend.box                : NULL
##   ..$ legend.box.margin         : 'margin' num [1:4] 0cm 0cm 0cm 0cm
##   ..$ legend.box.background     : list()
##   ..$ legend.box.spacing        : 'unit' num 11pt
##   ..$ panel.background          :List of 5
##   ..$ panel.border              : list()
##   ..$ panel.spacing             : 'unit' num 5.5pt
##   ..$ panel.spacing.x           : NULL
##   ..$ panel.spacing.y           : NULL
##   ..$ panel.grid                :List of 6
##   ..$ panel.grid.minor          : list()
##   ..$ panel.ontop               : logi FALSE
##   ..$ plot.background           :List of 5
##   ..$ plot.title                :List of 11
##   ..$ plot.subtitle             :List of 11
##   ..$ plot.caption              :List of 11
##   ..$ plot.tag                  :List of 11
##   ..$ plot.tag.position         : chr "topleft"
##   ..$ plot.margin               : 'margin' num [1:4] 5.5pt 5.5pt 5.5pt 5.5pt
##   ..$ strip.background          :List of 5
##   ..$ strip.placement           : chr "inside"
##   ..$ strip.text                :List of 11
##   ..$ strip.text.x              : NULL
##   ..$ strip.text.y              :List of 11
##   ..$ strip.switch.pad.grid     : 'unit' num 2.75pt
##   ..$ strip.switch.pad.wrap     : 'unit' num 2.75pt
##   ..$ panel.grid.major          : list()

Data mappings in ggplots

How does mapping work? Geometries (geoms) and statistics (stats) do not “see” the original variable names, instead the data passed to them is named according to the aesthetics user variables are mapped to. Geoms and stats work in tandem, with geoms doing the actual plotting and stats summarizing or transforming the data. It can be instructive to be able to see what data is received as input by a geom or stat, and what data is returned by a stat.

Both geoms and stats can have either panel- or group functions. Panel functions receive as input the subset of the data that corresponds to a whole panel, mapped to the aesthetics and with factors indicating the grouping (set by the user by mapping to a discrete scale). Group functions receive as input the subset of data corresponding to a single group based on the mapping, and called once for each group present in a panel.

The motivation for writing the “debug” stats and geoms included in package ‘gginnards’ is that at the moment it is in many cases not possible to set breakpoints inside the code of stats and geoms, because frequently nameless panel and group functions are stored within list-like "ggplot" objects as seen above.

This can make it tedious to analyse how these functions work, as one may need to add print statements to their definitions to see the data. I wrote the “debug” stats and geoms as tools to help in the development of my packages ‘ggpmisc’ and ‘ggspectra’, and as a way of learning myself how data are passed around within the different components of a ggplot object when it is printed.

Data input to geometries

Data pass through a statistics before being received by a geometry. However, many geometries, like geom_point() and geom_line() use by default stat_identity() which simply relays the unmodified data to the geometries.

The debug geometries and statistics in package ‘gginnards’ by default do not add any graphical element to the plot but instead they make visible the data as received as their input.

The geometry geom_debug() uses stat_identity() by default. Here the same data as rendered by geom_point() is printed as a tibble to the R console. We can see that the columns are named according to the aesthetics the variables in the user-supplied data have been mapped. In the case of colour, the levels of the factor have been replaced by colour definitions. Columns PANEL and group have been also added.

ggplot(mpg, aes(cyl, hwy, colour = factor(cyl))) + 
  geom_point() +
  geom_debug()
## # A tibble: 234 x 5
##    colour      x     y PANEL group
##    <chr>   <dbl> <dbl> <fct> <int>
##  1 #F8766D     4    29 1         1
##  2 #F8766D     4    29 1         1
##  3 #F8766D     4    31 1         1
##  4 #F8766D     4    30 1         1
##  5 #00BFC4     6    26 1         3
##  6 #00BFC4     6    26 1         3
##  7 #00BFC4     6    27 1         3
##  8 #F8766D     4    26 1         1
##  9 #F8766D     4    25 1         1
## 10 #F8766D     4    28 1         1
## # ... with 224 more rows

Below we show how geom_debug() can be used together with functions that take a data frame as input and return a value that can be printed. We use here head() but other functions such summary(), nrow() and colnames() as well as user defined functions can be useful when data is large. As shown here, additional arguments can be passed by name to the function.

ggplot(my.data, aes(x, y, colour = group)) + 
  geom_point() + 
  geom_debug(summary.fun = head, summary.fun.args = list(n = 3))
##    colour x         y PANEL group
## 1 #F8766D 1 -27205.45     1     1
## 2 #00BFC4 2 -14242.65     1     2
## 3 #F8766D 3  45790.92     1     1

When using a statistic that modifies the data, we can pass geom_debug() as argument in the call to this statistic. In this way the data printed to the console will be those returned by the statistics and received by the geometry.

ggplot(mpg, aes(cyl, hwy, colour = factor(cyl))) +
  stat_summary(fun.data = "mean_se") +
  stat_summary(fun.data = "mean_se", geom = "debug") 
## # A tibble: 4 x 7
##   colour      x group     y  ymin  ymax PANEL
##   <chr>   <dbl> <int> <dbl> <dbl> <dbl> <fct>
## 1 #F8766D     4     1  28.8  28.3  29.3 1    
## 2 #7CAE00     5     2  28.8  28.5  29   1    
## 3 #00BFC4     6     3  22.8  22.4  23.2 1    
## 4 #C77CFF     8     4  17.6  17.2  18.0 1

As shown above an important use of geom_debug() it to display the data returned by a statistic and received as input by geometries. Not all extensions to ‘ggplot2’ document all the computed variables returned by statistics. In other cases like in the next example, the values returned will depend on the arguments passed. While in the previous example the statistic returned a data frame with one row per group, here the returned data frame has 160 rows. The data are by default plotted as a line with a confidence band.

ggplot(my.data, aes(x, y, colour = group)) + 
  geom_point() + 
  stat_smooth(method = "lm", formula = y ~ poly(x, 2)) +
  stat_smooth(method = "lm", formula = y ~ poly(x, 2), geom = "debug")
## # A tibble: 160 x 8
##    colour      x      y    ymin   ymax     se PANEL group
##    <chr>   <dbl>  <dbl>   <dbl>  <dbl>  <dbl> <fct> <int>
##  1 #F8766D  1    23456. -26311. 73223. 24738. 1         1
##  2 #F8766D  2.24 18462. -28860. 65784. 23523. 1         1
##  3 #F8766D  3.48 13883. -31095. 58861. 22358. 1         1
##  4 #F8766D  4.72  9719. -33018. 52456. 21244. 1         1
##  5 #F8766D  5.96  5970. -34632. 46573. 20183. 1         1
##  6 #F8766D  7.20  2637. -35941. 41215. 19176. 1         1
##  7 #F8766D  8.44  -281. -36948. 36386. 18227. 1         1
##  8 #F8766D  9.68 -2783. -37657. 32090. 17335. 1         1
##  9 #F8766D 10.9  -4871. -38072. 28331. 16504. 1         1
## 10 #F8766D 12.2  -6543. -38197. 25112. 15735. 1         1
## # ... with 150 more rows

Data input to geometries

Statistics can be defined to operate on data corresponding to a whole panel or separately on data corresponding to each individual group, as created by mapping aesthetics to factors. The statistics described below print a summary of their data input by default to the console. These statistics, in addition return a data frame containing text mapped to labels suitable for “plotting” with geom “text” or geom “label”. This text gives a summary of the data.

The ‘gginnards’ package defines a "null" geom, which is used as default by the debug statistics. This geom is similar to the more recently added ggplot2::geom_blank() and is used as default geom in the statistics described here.

ggplot(my.data, aes(x, y, colour = group)) + 
  geom_null()

Using as default geom “null” allows to add the debug stats for the side effect of console output without altering the rendering of the plot when there is at least one other plot layer.

Because of the way ‘ggplot2’ works, the values are listed to the console at the time when the ggplot object is printed. As shown here, no other geom or stat is required, however in the remaining examples we add geom_point() to make the data also visible in the plot.

ggplot(my.data, aes(x, y)) + 
  stat_debug_group()
## [1] "Input 'data' to 'compute_group()':"
## # A tibble: 100 x 4
##        x       y PANEL group
##    <dbl>   <dbl> <fct> <int>
##  1     1 -27205. 1        -1
##  2     2 -14243. 1        -1
##  3     3  45791. 1        -1
##  4     4  53731. 1        -1
##  5     5  -8029. 1        -1
##  6     6 102864. 1        -1
##  7     7 -18547. 1        -1
##  8     8  13081. 1        -1
##  9     9  79924. 1        -1
## 10    10 -44711. 1        -1
## # ... with 90 more rows

In the absence of facets or groups we get the printout of a single data frame, which is similar to that returned by geom_debug(). Without grouping, group is set to -1 for all observations.

ggplot(my.data, aes(x, y)) + 
  geom_point() + 
  stat_debug_group()
## [1] "Input 'data' to 'compute_group()':"
## # A tibble: 100 x 4
##        x       y PANEL group
##    <dbl>   <dbl> <fct> <int>
##  1     1 -27205. 1        -1
##  2     2 -14243. 1        -1
##  3     3  45791. 1        -1
##  4     4  53731. 1        -1
##  5     5  -8029. 1        -1
##  6     6 102864. 1        -1
##  7     7 -18547. 1        -1
##  8     8  13081. 1        -1
##  9     9  79924. 1        -1
## 10    10 -44711. 1        -1
## # ... with 90 more rows

In a plot with no grouping, there is no difference in the data input for compute_panel() and compute_group() functions (this applies in general to ggplot statistics).

ggplot(my.data, aes(x, y)) + 
  geom_point() + 
  stat_debug_panel()
## [1] "Input 'data' to 'compute_panel()':"
## # A tibble: 100 x 4
##        x       y PANEL group
##    <dbl>   <dbl> <fct> <int>
##  1     1 -27205. 1        -1
##  2     2 -14243. 1        -1
##  3     3  45791. 1        -1
##  4     4  53731. 1        -1
##  5     5  -8029. 1        -1
##  6     6 102864. 1        -1
##  7     7 -18547. 1        -1
##  8     8  13081. 1        -1
##  9     9  79924. 1        -1
## 10    10 -44711. 1        -1
## # ... with 90 more rows

By mapping the colour aesthetic we create a grouping. In the case, compute_group() is called with the data subsetted by group, and a separate data frame is displayed for each call compute_group(), corresponding each to a level in the mapped factor. In this case group takes as values positive consecutive integers. As a factor was mapped to colour, colour is encoded as a factor.

ggplot(my.data, aes(x, y, colour = group)) + 
  geom_point() + 
  stat_debug_group()
## [1] "Input 'data' to 'compute_group()':"
## # A tibble: 50 x 5
##        x       y colour PANEL group
##    <dbl>   <dbl> <fct>  <fct> <int>
##  1     1 -27205. A      1         1
##  2     3  45791. A      1         1
##  3     5  -8029. A      1         1
##  4     7 -18547. A      1         1
##  5     9  79924. A      1         1
##  6    11  -2824. A      1         1
##  7    13 -78017. A      1         1
##  8    15 -74281. A      1         1
##  9    17   9904. A      1         1
## 10    19 -94023. A      1         1
## # ... with 40 more rows
## [1] "Input 'data' to 'compute_group()':"
## # A tibble: 50 x 5
##        x       y colour PANEL group
##    <dbl>   <dbl> <fct>  <fct> <int>
##  1     2 -14243. B      1         2
##  2     4  53731. B      1         2
##  3     6 102864. B      1         2
##  4     8  13081. B      1         2
##  5    10 -44711. B      1         2
##  6    12  23840. B      1         2
##  7    14  75602. B      1         2
##  8    16 104677. B      1         2
##  9    18 -68747. B      1         2
## 10    20 -39230. B      1         2
## # ... with 40 more rows

Without facets, we still have only one panel.

ggplot(my.data, aes(x, y, colour = group)) + 
  geom_point() + 
  stat_debug_panel()
## [1] "Input 'data' to 'compute_panel()':"
## # A tibble: 100 x 5
##        x       y colour PANEL group
##    <dbl>   <dbl> <fct>  <fct> <int>
##  1     1 -27205. A      1         1
##  2     2 -14243. B      1         2
##  3     3  45791. A      1         1
##  4     4  53731. B      1         2
##  5     5  -8029. A      1         1
##  6     6 102864. B      1         2
##  7     7 -18547. A      1         1
##  8     8  13081. B      1         2
##  9     9  79924. A      1         1
## 10    10 -44711. B      1         2
## # ... with 90 more rows

When we map the same factor to a different aesthetic the data remain similar, except for the column named after the aesthetic, in this case shape.

ggplot(my.data, aes(x, y, shape = group)) + 
  geom_point() + 
  stat_debug_group()
## [1] "Input 'data' to 'compute_group()':"
## # A tibble: 50 x 5
##        x       y shape PANEL group
##    <dbl>   <dbl> <fct> <fct> <int>
##  1     1 -27205. A     1         1
##  2     3  45791. A     1         1
##  3     5  -8029. A     1         1
##  4     7 -18547. A     1         1
##  5     9  79924. A     1         1
##  6    11  -2824. A     1         1
##  7    13 -78017. A     1         1
##  8    15 -74281. A     1         1
##  9    17   9904. A     1         1
## 10    19 -94023. A     1         1
## # ... with 40 more rows
## [1] "Input 'data' to 'compute_group()':"
## # A tibble: 50 x 5
##        x       y shape PANEL group
##    <dbl>   <dbl> <fct> <fct> <int>
##  1     2 -14243. B     1         2
##  2     4  53731. B     1         2
##  3     6 102864. B     1         2
##  4     8  13081. B     1         2
##  5    10 -44711. B     1         2
##  6    12  23840. B     1         2
##  7    14  75602. B     1         2
##  8    16 104677. B     1         2
##  9    18 -68747. B     1         2
## 10    20 -39230. B     1         2
## # ... with 40 more rows

Facets based on factors create panels within a plot. Here we create a plot with both facets and grouping. In this case, for each panel the compute_panel() function is called once with a subset of the data that corresponds to one panel, but not split by groups. For our example, it is called twice.

ggplot(my.data, aes(x, y, colour = group)) + 
  geom_point() + 
  stat_debug_panel(summary.fun = nrow) +
  facet_wrap(~block)
## [1] "Input 'data' to 'compute_panel()':"
## [1] 50
## [1] "Input 'data' to 'compute_panel()':"
## [1] 50

with grouping and facets, within each panel the compute_group() function is called for each group, in total four times.

ggplot(my.data, aes(x, y, colour = group)) + 
  geom_point() + 
  stat_debug_group() +
  facet_wrap(~block)
## [1] "Input 'data' to 'compute_group()':"
## # A tibble: 25 x 5
##        x       y colour PANEL group
##    <dbl>   <dbl> <fct>  <fct> <int>
##  1     1 -27205. A      1         1
##  2     5  -8029. A      1         1
##  3     9  79924. A      1         1
##  4    13 -78017. A      1         1
##  5    17   9904. A      1         1
##  6    21  40551. A      1         1
##  7    25   9950. A      1         1
##  8    29 -27902. A      1         1
##  9    33 109170. A      1         1
## 10    37  40859. A      1         1
## # ... with 15 more rows
## [1] "Input 'data' to 'compute_group()':"
## # A tibble: 25 x 5
##        x       y colour PANEL group
##    <dbl>   <dbl> <fct>  <fct> <int>
##  1     2 -14243. B      1         2
##  2     6 102864. B      1         2
##  3    10 -44711. B      1         2
##  4    14  75602. B      1         2
##  5    18 -68747. B      1         2
##  6    22  10961. B      1         2
##  7    26   3102. B      1         2
##  8    30 -59669. B      1         2
##  9    34  10203. B      1         2
## 10    38 104296. B      1         2
## # ... with 15 more rows
## [1] "Input 'data' to 'compute_group()':"
## # A tibble: 25 x 5
##        x       y colour PANEL group
##    <dbl>   <dbl> <fct>  <fct> <int>
##  1     3  45791. A      2         1
##  2     7 -18547. A      2         1
##  3    11  -2824. A      2         1
##  4    15 -74281. A      2         1
##  5    19 -94023. A      2         1
##  6    23  12150. A      2         1
##  7    27  23485. A      2         1
##  8    31  39727. A      2         1
##  9    35  98482. A      2         1
## 10    39  76003. A      2         1
## # ... with 15 more rows
## [1] "Input 'data' to 'compute_group()':"
## # A tibble: 25 x 5
##        x        y colour PANEL group
##    <dbl>    <dbl> <fct>  <fct> <int>
##  1     4  53731.  B      2         2
##  2     8  13081.  B      2         2
##  3    12  23840.  B      2         2
##  4    16 104677.  B      2         2
##  5    20 -39230.  B      2         2
##  6    24  52254.  B      2         2
##  7    28  41669.  B      2         2
##  8    32  76039.  B      2         2
##  9    36     74.0 B      2         2
## 10    40 146451.  B      2         2
## # ... with 15 more rows

Controlling the debug output

In the examples above we have demonstrated the use of the statistics and geometries using default arguments. Here we show examples of generation of other types of debug output.

Display debug output on the plot

Differently to geom_debug() where the data can only be printed to the console, stat_debug_group() and stat_debug_panel() return data that can be plotted using a geometry. If we use as geometries "label" or "text" a debug summary is added to the plot itself, we can also pass other arguments valid for the geometry used, in this case vjust.

## [1] "Input 'data' to 'compute_group()':"
## # A tibble: 50 x 5
##        x       y shape PANEL group
##    <dbl>   <dbl> <fct> <fct> <int>
##  1     1 -27205. A     1         1
##  2     3  45791. A     1         1
##  3     5  -8029. A     1         1
##  4     7 -18547. A     1         1
##  5     9  79924. A     1         1
##  6    11  -2824. A     1         1
##  7    13 -78017. A     1         1
##  8    15 -74281. A     1         1
##  9    17   9904. A     1         1
## 10    19 -94023. A     1         1
## # ... with 40 more rows
## [1] "Input 'data' to 'compute_group()':"
## # A tibble: 50 x 5
##        x       y shape PANEL group
##    <dbl>   <dbl> <fct> <fct> <int>
##  1     2 -14243. B     1         2
##  2     4  53731. B     1         2
##  3     6 102864. B     1         2
##  4     8  13081. B     1         2
##  5    10 -44711. B     1         2
##  6    12  23840. B     1         2
##  7    14  75602. B     1         2
##  8    16 104677. B     1         2
##  9    18 -68747. B     1         2
## 10    20 -39230. B     1         2
## # ... with 40 more rows

This approach is of limited use except for highlighting groups and panels, possible when learning or teaching how ggplots are assembled.

## [1] "Input 'data' to 'compute_panel()':"
## # A tibble: 50 x 5
##        x       y colour PANEL group
##    <dbl>   <dbl> <fct>  <fct> <int>
##  1     1 -27205. A      1         1
##  2     2 -14243. B      1         2
##  3     5  -8029. A      1         1
##  4     6 102864. B      1         2
##  5     9  79924. A      1         1
##  6    10 -44711. B      1         2
##  7    13 -78017. A      1         1
##  8    14  75602. B      1         2
##  9    17   9904. A      1         1
## 10    18 -68747. B      1         2
## # ... with 40 more rows
## [1] "Input 'data' to 'compute_panel()':"
## # A tibble: 50 x 5
##        x       y colour PANEL group
##    <dbl>   <dbl> <fct>  <fct> <int>
##  1     3  45791. A      2         1
##  2     4  53731. B      2         2
##  3     7 -18547. A      2         1
##  4     8  13081. B      2         2
##  5    11  -2824. A      2         1
##  6    12  23840. B      2         2
##  7    15 -74281. A      2         1
##  8    16 104677. B      2         2
##  9    19 -94023. A      2         1
## 10    20 -39230. B      2         2
## # ... with 40 more rows

Of course to see all the returned variables, we can use geom_debug().

## [1] "Input 'data' to 'compute_group()':"
## # A tibble: 50 x 5
##        x       y shape PANEL group
##    <dbl>   <dbl> <fct> <fct> <int>
##  1     1 -27205. A     1         1
##  2     3  45791. A     1         1
##  3     5  -8029. A     1         1
##  4     7 -18547. A     1         1
##  5     9  79924. A     1         1
##  6    11  -2824. A     1         1
##  7    13 -78017. A     1         1
##  8    15 -74281. A     1         1
##  9    17   9904. A     1         1
## 10    19 -94023. A     1         1
## # ... with 40 more rows
## [1] "Input 'data' to 'compute_group()':"
## # A tibble: 50 x 5
##        x       y shape PANEL group
##    <dbl>   <dbl> <fct> <fct> <int>
##  1     2 -14243. B     1         2
##  2     4  53731. B     1         2
##  3     6 102864. B     1         2
##  4     8  13081. B     1         2
##  5    10 -44711. B     1         2
##  6    12  23840. B     1         2
##  7    14  75602. B     1         2
##  8    16 104677. B     1         2
##  9    18 -68747. B     1         2
## 10    20 -39230. B     1         2
## # ... with 40 more rows
## # A tibble: 2 x 10
##   shape label             x      y  nrow  ncol colnames  colclasses  group PANEL
##   <dbl> <chr>         <dbl>  <dbl> <int> <int> <fct>     <fct>       <fct> <fct>
## 1    16 "group: 1; P~    50 4.45e5    50     5 x, y, sh~ x: numeric~ 1     1    
## 2    17 "group: 2; P~    51 5.04e5    50     5 x, y, sh~ x: numeric~ 2     1

Assign debug output to a variable

In the next example we show how to save the data input of the geom to a variable in the global environment. However, assignment takes place at the time the ggplot object is printed. An approach like this could be used to capture the data output of statistics in automated code tests.

pipe_assign <- function(value, name, pos = .GlobalEnv, ...) {
  assign(x = name, value = value, inherits = FALSE, pos = pos, ...)
}

ggplot(my.data, aes(x, y, colour = group)) + 
  geom_point() + 
  geom_debug(summary.fun = pipe_assign, 
             summary.fun.args = list(name = "debug_data"),
             print.fun = NULL)

head(debug_data)
##    colour x          y PANEL group
## 1 #F8766D 1 -27205.450     1     1
## 2 #00BFC4 2 -14242.651     1     2
## 3 #F8766D 3  45790.918     1     1
## 4 #00BFC4 4  53731.420     1     2
## 5 #F8766D 5  -8028.578     1     1
## 6 #00BFC4 6 102863.943     1     2