studentlife: Tidy Handling and Navigation of a Valuable Mobile-Health Dataset

Travis-CI Build Status Binder License: GPL v3 Code coverage Lifecycle Status

Use this R package to download, navigate and analyse the Student-Life dataset. The Student-Life dataset contains passive and automatic sensing data from the phones of a class of 48 de-identified Dartmouth college students. It was collected over a 10 week term. Additionally, the dataset contains Ecological Momentary Assessment results along with pre- and post-study mental health surveys, such as the PHQ-9. The intended use is to assess mental health, academic performance and behavioral trends. The raw dataset and additional information is available at https://studentlife.cs.dartmouth.edu/.

Install from GitHub with

## Uncomment to install the package
# install.packages("devtools")
# devtools::install_github("frycast/studentlife")

Details on the dataset are available here. Once installed, you can download and extract the data within R:

d <- tempdir()
library(studentlife)
download_studentlife(dest = d)

Then you can use the interactive menu to browse the tables and schemas:

tab <- studentlife::load_SL_tibble(location = d)

The object returned by the above function is called a ‘StudentLife tibble’ (or SL_tbl). Restrictions can be placed on the menu options with time_options:

tab_t <- load_SL_tibble(location = d, time_options = "timestamp", csv_nrows = 10)
tab_p <- load_SL_tibble(location = d, time_options = "interval" , csv_nrows = 10)
tab_d <- load_SL_tibble(location = d, time_options = "dateonly" , csv_nrows = 10)
tab_s <- load_SL_tibble(location = d, time_options = "dateless" , csv_nrows = 10)

The regularise_time function can be used to summarise information within blocks of time, producing an object called a ‘regularised StudentLife tibble’ (or reg_SL_tbl):

tab <- load_SL_tibble(
  loc = d, schema = "sensing", table = "activity", csv_nrows = 10)
  
regularise_time(
  tab, blocks = c("day","weekday"),
  act_inf = max(activity_inference), add_NAs = FALSE)

Produce a histogram showing PAM EMA response frequencies over the course of the study:

tab_PAM <- load_SL_tibble(schema = "EMA", table = "PAM", location = d)
response_hour_hist(tab_PAM, break_hours = 10)

A summary will produce details such as EMA question, UIDs of dropped students, schema name, table name, and summary statistics.

summary(tab_PAM)
EMA_questions 
 Refer to: Pollak, J. P., Adams, P., & Gay, G. (2011, May). 
 PAM: a photographic affect meter for frequent, in situ measurement of affect. 
 In Proceedings of the SIGCHI conference on Human factors in computing systems 
 (pp. 725-734). ACM. 

dropped_students 
 None 

column_names 
 picture_idx timestamp uid 

schema 
 EMA 

table 
 PAM 

Skim summary statistics
 n obs: 9040 
 n variables: 3 

-- Variable type:factor --------------------------------------------------------
 variable missing complete    n n_unique                        top_counts ordered
      uid       0     9040 9040       49 59: 437, 0: 390, 19: 384, 57: 377   FALSE

-- Variable type:numeric -------------------------------------------------------
    variable missing complete    n    mean         sd      p0     p25     p50      p75     p100
 picture_idx       0     9040 9040 8.85          4.17 1       6       8       12       16      
   timestamp       0     9040 9040 1.4e+09 1608614.45 1.4e+09 1.4e+09 1.4e+09  1.4e+09  1.4e+09
     hist
 ▂▃▆▇▃▆▅▅
 ▇▇▆▅▃▁▁▁

After regularising a StudentLife tibble, we can visualise the missing values in each block for each student:

reg_PAM <- regularise_time(tab_PAM, blocks = c("day", "epoch"), m = mean(picture_idx, na.rm = TRUE))
vis_NAs(reg_PAM, response = "m")

We can also visualise and compare the total number of responses received from each student over the course of the study:

vis_response_counts(reg_PAM, response = "m")

Summaries of survey data are formatted with question information and some answer statistics:

tab_PHQ9 <- load_SL_tibble(loc = d, schema = "survey", table = "PHQ-9")
summary(tab_PHQ9)
time_info 
 none 

survey_questions 
 Q1: Little interest or pleasure in doing things
 Q2: Feeling down, depressed, hopeless.
 Q3: Trouble falling or staying asleep, or sleeping too much.
 Q4: Feeling tired or having little energy
 Q5: Poor appetite or overeating
 Q6: Feeling bad about yourself or that you are a failure or have let yourself or your family down
 Q7: Trouble concentrating on things, such as reading the newspaper or watching television
 Q8: Moving or speaking so slowly that other people could have noticed. Or the opposite being so figety or restless that you have been moving around a lot more than usual
 Q9: Thoughts that you would be better off dead, or of hurting yourself
 Q10: Response
 

column_names 
 uid type Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 

schema 
 survey 

table 
 PHQ-9 

skim 
Skim summary statistics
 n obs: 84 
 n variables: 12 

-- Variable type:factor --------------------------------------------------------
 variable missing complete  n n_unique                        top_counts ordered
       Q1       0       84 84        4 Not: 42, Sev: 28, Mor: 10, Nea: 4   FALSE
      Q10       5       79 84        4   Som: 37, Not: 36, NA: 5, Ext: 3   FALSE
       Q2       0       84 84        4  Not: 40, Sev: 34, Mor: 6, Nea: 4   FALSE
       Q3       0       84 84        4 Not: 41, Sev: 25, Mor: 10, Nea: 8   FALSE
       Q4       0       84 84        4 Sev: 38, Not: 23, Mor: 17, Nea: 6   FALSE
       Q5       0       84 84        4  Sev: 35, Not: 34, Mor: 9, Nea: 6   FALSE
       Q6       0       84 84        4  Not: 45, Sev: 27, Mor: 6, Nea: 6   FALSE
       Q7       0       84 84        4  Not: 48, Sev: 26, Mor: 7, Nea: 3   FALSE
       Q8       0       84 84        4  Not: 64, Sev: 15, Mor: 4, Nea: 1   FALSE
       Q9       0       84 84        3    Not: 75, Sev: 6, Mor: 3, NA: 0   FALSE
     type       0       84 84        2           pre: 46, pos: 38, NA: 0   FALSE
      uid       0       84 84       46            0: 2, 1: 2, 2: 2, 3: 2   FALSE