One of the things that Stata has that RStudio lacks is the variable explorer. This extremely useful especially if you’re working with datasets with a large number of variables with hard to remember names, but descriptive labels. In Stata you just search in the variable explorer and then click on the variable to get its name into the console.
As an example of a dataset like this, consider the Quality of Government standard dataset. Here’s the 2018 version of the cross-section data:
It has 194 observations (different countries) and 1882 variables.
The variables have names like wdi_acelu
, bci_bci
, eu_eco2gdpeurhab
, gle_cgdpc
etc. Not exactly things you want to remember.
Working with this in Stata is relatively easy because you just search in the variable explorer for things like “sanitation”, “corruption”, “GDP”, etc. and you find the variable names.
Unfortunately, RStudio doesn’t have a variable explorer panel. But you can improvise something like the following:
BAM! We just made a variable explorer! If you run this code in the console it opens the DT::datatable
in the RStudio’s Viewer pane, which is pretty much replicating the Stata experience (except that it is read-only).
But we can do better! Why not include additional information, like the number of missing observations, summary statistics, or an overview of the values of each variable?
vars_explore
This will create a searchable variable explorer, and calculate summary statistics for each variable:
The table is searchable, and you can, furthermore, arrange it, say, based on which variable has least missing values. For instance, search for “GDP per capita” and see which variable provides most complete information.
If you click on the + next to a variable, you will get both the summary statistics, and unique values and, if present, the value labels. The option value.labels.width
limits how many characters to include in the “Value labels” and “Values” columns. Default is 500.
If you set minimal = TRUE
, only “Variable”, “Description”, “Obs.”, and “Missing” will be shown (and none the summary stats will be calculated).
By default, vars_explore
only shows the summary in RStudio’s Viewer Pane, and doesn’t return anything. But you can change this by setting viewer = FALSE
and silent = FALSE
. This speeds it us massively because it’s the DT::datatable()
that takes most of the time, and allows you to build summary stats (e.g. for a paper):
vdem_summary <- qog %>%
select(starts_with("vdem_")) %>%
vars_explore(viewer = FALSE, silent = FALSE) %>%
select(-Values, -`Value labels`)
knitr::kable(vdem_summary)
Variable | Description | Type | Obs. | Missing | Mean | Median | Std.Dev. | Min | Max |
---|---|---|---|---|---|---|---|---|---|
vdem_corr | Political corruption index | numeric | 168 | 26 | 0.51 | 0.57 | 0.29 | 0.010031 | 0.968504 |
vdem_delibdem | Deliberative democracy index | numeric | 168 | 26 | 0.41 | 0.39 | 0.26 | 0.002548 | 0.912120 |
vdem_dl_delib | Deliberative component index | numeric | 168 | 26 | 0.66 | 0.74 | 0.25 | 0.020510 | 0.989109 |
vdem_edcomp_thick | Electoral component index | numeric | 168 | 26 | 0.62 | 0.65 | 0.24 | 0.004832 | 0.939636 |
vdem_egal | Egalitarian component index | numeric | 168 | 26 | 0.62 | 0.63 | 0.20 | 0.086303 | 0.969009 |
vdem_egaldem | Egalitarian democracy index | numeric | 168 | 26 | 0.42 | 0.36 | 0.23 | 0.043189 | 0.869084 |
vdem_elvotbuy | Election vote buying | numeric | 162 | 32 | 2.09 | 1.82 | 1.03 | 0.211176 | 3.982266 |
vdem_exbribe | Executive bribery and corrupt exchanges | numeric | 168 | 26 | 1.93 | 1.79 | 1.06 | 0.097352 | 3.938878 |
vdem_excrptps | Public sector corrupt exchanges | numeric | 168 | 26 | 1.82 | 1.69 | 1.01 | 0.113801 | 3.937302 |
vdem_execorr | Executive corruption index | numeric | 168 | 26 | 0.48 | 0.52 | 0.30 | 0.010515 | 0.974874 |
vdem_exembez | Executive embezzlement and theft | numeric | 168 | 26 | 2.13 | 2.11 | 1.11 | 0.091897 | 3.924967 |
vdem_exthftps | Public sector theft | numeric | 168 | 26 | 0.06 | -0.13 | 1.49 | -3.137724 | 3.399789 |
vdem_gcrrpt | Legislature corrupt activities | numeric | 168 | 26 | 1.85 | 1.67 | 0.89 | 0.299263 | 3.894069 |
vdem_gender | Women political empowerment index | numeric | 168 | 26 | 0.74 | 0.79 | 0.17 | 0.274245 | 0.956486 |
vdem_jucorrdc | Judicial corruption decision | numeric | 168 | 26 | 2.26 | 2.25 | 0.85 | 0.475427 | 3.825114 |
vdem_libdem | Liberal democracy index | numeric | 168 | 26 | 0.43 | 0.40 | 0.26 | 0.015430 | 0.886297 |
vdem_liberal | Liberal component index | numeric | 168 | 26 | 0.62 | 0.68 | 0.25 | 0.037652 | 0.975519 |
vdem_mecorrpt | Media corrupt | numeric | 168 | 26 | 2.54 | 2.62 | 0.97 | 0.079457 | 3.966262 |
vdem_partip | Participatory component index | numeric | 168 | 26 | 0.47 | 0.52 | 0.16 | 0.042004 | 0.711231 |
vdem_partipdem | Participatory democracy index | numeric | 168 | 26 | 0.34 | 0.34 | 0.19 | 0.014923 | 0.672246 |
vdem_polyarchy | Electoral democracy index | numeric | 168 | 26 | 0.55 | 0.57 | 0.24 | 0.026478 | 0.924651 |
vdem_pubcorr | Public sector corruption index | numeric | 168 | 26 | 0.50 | 0.52 | 0.30 | 0.005353 | 0.981405 |
You can also opt to use View()
instead of the Viewer Pane, which works much faster than DT::datatable()
, although, given how RStudio works, this makes hard to see both the variable explorer and your script at the same time.
The best alternative to vars_explore
is the vtable
package. The downsides of vtable
are
You can, however, use vtable
to generate a dataframe that can be opened with View()
, just like you can with vars_explore(silent = FALSE, viewer = FALSE)
. Unfortunately, what makes vtable
faster is precisely its limitations, as the most time consuming part is loading up the DT::datatable
, rather than calculating anything. vtable
works fast because it creates a simple HTML file, but that is not searchable in the Viewer Pane.
Another alternative is sjPlot::view_df
, which provides summary stats in individual columns, but it is very slow. Also, like vtable
, it doesn’t provide a searchable table in the Viewer Pane.
This was made possible by Reigo Hendrikson’s datatable2
: http://www.reigo.eu/2018/04/extending-dt-child-row-example/
As far as I know, Reigo hasn’t made this available in a package. It is included in this package, with some minor modifications, and you can use it with varsExplore::datatable2()
.