Walking strides segmentation with adept

Marta Karas

Ciprian Crainiceanu

Jacek Urbanek

2019-06-18

This vignette provides an example of segmentation of walking strides (two consecutive steps) in sub-second accelerometry data with adept package. The exemplary dataset is a part of the adeptdata package. We demonstrate that ADEPT1 can be used to perform automatic and precise walking stride segmentation from data collected during a combination of running, walking and resting exercises. We introduce how to segment data:

  1. with the use of stride templates that were pre-computed based on data from an external study (attached to adeptdata package),
  2. by deriving new stride templates in a semi-manual manner.

See Introduction to adept package2 vignette for an introduction to the ADEPT and usage examples of the segmentPattern function which implements the ADEPT method.

Raw accelerometry data sample

The adeptdata package contains acc_running - a sample of raw accelerometry data collected during 25 minutes of an outdoor run. Data were collected at the sampling frequency of 100 Hz with two ActiGraph GT9X Link sensors located at the left hip and left ankle.

Running trial

mapmyrun mobile tracking application (link) was used during 25 minutes of running (Patterson Park area, Baltimore, MD) to collect acc_running accelerometry data set. Based on the mobile app, the distance covered is approximately 3.35 km. A ground elevation plot generated by the mobile app presents signature trial characteristics (see figure below). The timestamp in acc_running dataset matches the mobile app up to ~1-minute.


Screenshot taken from a personal profile of mapmyrun tracking application, accessed via https://www.mapmyrun.com.

Sensor and accelerometry data

Data were collected with two ActiGraph GT9X Link physical activity monitors at the sampling frequency of 100 Hz. ActiGraph GT9X Link has 3-axis accelerometer collecting accelerometry data along three orthogonal axes. At a sampling frequency of 100 Hz, we collected 100 observations per second per axis (total of 300 observations per second).

Sensor location

First sensor (denoted as “left_ankle”) was attached to the outer side of the left shoe with a slide-on clip, just below the ankle. A second sensor (denoted as “left_hip”) was attached on the left side of the elastic belt located hip (see image below). Both devices remained stable during the run-trial.



Wearable accelerometer devices location during the experiment. The devices were still covered with a protective plastic foil.

Data set acc_running

To access the acc_running data, load adeptdata package.

acc_running consists of 300,000 observations of 5 variables:

Note on date_time column values
  • Values in the date_time column were generated via seq(from = as.POSIXct("2018-10-25 17:57:30.00", tz = "UTC"), by = 0.01, length.out = 150000) and its display exhibits, likely, the floating point arithmetic problem (see this SO question). Note date_time range is not affected and spans from "2018-10-25 17:57:30.00 UTC" to "2018-10-25 18:22:29.99 UTC", inclusive.

Accelerometry data visualization

Sub-second level accelerometry data

One way to visualize raw accelerometry data is to plot it as a three-dimensional time-series \((x,y,z)\). Here, we plot data from three different time frames, each of 4 seconds length, simultaneously for data collected at left ankle and left hip.

Note on sensors desynchronization
  • The two sensors used in the experiment were set up to initialize data collection at the same time. However, as discussed in3, perfect synchronization on most of modern operating systems is impossible. Additionally, measurements might still get desynchronized across devices even after a few minutes of data collection. Because of that, sub-second level alignment of data cross devices cannot be expected.

Vector magnitude

Vector magnitude \((vm)\) is often used to reduce the dimensionality of accelerometry time-series \((x,y,z)\). Vector magnitude is computed as \(vm = \sqrt{x^2 + y^2 + z^2}\) at each time point resulting in 1- instead of 3-dimensional time-series.

Plots of\((x,y,z)\) and \((vm)\) show an asymmetric repetitive pattern corresponding to walking strides:

There are also visible differences in amplitudes and stride durations across the three vertical plot panels:

Vector magnitude count

In reality, it is often challenging to make a plot of all data points collected at sampling frequency of 100 Hz even in 25 minutes-long time-series. A one way to summarize accelerometry data of such high density is to use a \((vmc)\) - vector magnitude count (also known as the mean amplitude deviation). For \(\overline{vm}(t,H)\) - average of \((vm)\) time-series over time window of length \(H\) starting at time \(t\), we define \[\mathrm { vmc } ( t, H ) = \frac { 1 } { H } \sum _ { h = 0 } ^ { H - 1 } \left| vm ( t + h ) - \overline{vm}(t,H) \right|.\]

Walking strides segmentation

To segment strides from \((vm)\) time-series, we use stride accelerometry data templates. These templates are specific to a wearable sensor location (i.e. left ankle-specific template). We demonstrate two approaches:

  1. Approach 1: Use stride templates derived from accelerometry data collected in a different experiment, with different participants. These templates are attached to the adeptdata package as stride_template object; data used to derive them are also attached to the adeptdata package as acc_walking_IU (see ?stride_template, ?acc_walking_IU for details).

  2. Approach 2: Derive stride templates semi-manually from acc_running data set.

Segmentation with Approach 1: use existing stride templates

Left ankle

We use pre-computed stride_template {adeptdata} to build a template object – a list of left ankle-specific stride templates.

We use segmentPattern {adept} function to segment strides from \((vm)\) accelerometry time-series.

Explanation of function arguments used:

  • x - A time-series to segment pattern occurrences from (here: \((vm)\) accelerometry time-series).
  • x.fs - Data sampling frequency, expressed in a number of observations per second.
  • template – Pattern template(s).
  • pattern.dur.seq - A grid of potential stride durations, expressed in seconds.
  • similarity.measure - Statistic used to quantify similarity between accelerometry time-series and template.
  • x.adept.ma.W - Length of smoothing window used in accelerometry time-series processing before computing siliarity between accelerometry time-series and templates, expressed in seconds.
  • finetune - Procedure employed to fine-tune preliminarily identified beginnings and ends of a pattern in a accelerometry time-series.
  • finetune.maxima.ma.W - Length of smoothing window used in "maxima" fine-tune procedure, expressed in seconds.
  • finetune.maxima.nbh.W - Length of a local maxima search grid used in "maxima" fine-tune procedure, expressed in seconds.
  • compute.template.idx- Whether or not to compute which of the (possibly multiple) templates best-matched accelerometry time-series.
    run.parallel - Whether or not to use parallel execution in the algorithm.

See ?segmentPattern for a detailed explanation of all parameters. See Introduction to adept package vignette for explanation of segmentPattern {adept} parameters with simulated data examples.

The segmentation result is a data frame, where each row describes one identified pattern occurrence:

  • tau_i - index of x where pattern starts,
  • T_i - pattern duration, expressed in x vector length,
  • sim_i - similarity between a pattern and x,
  • template_i - index of a pattern template best matched to a pattern in the time-series x.

Results: estimated stride duration time

We estimate stride duration, expressed in seconds, as T_i divided by data sampling frequency (here: \(100\)). Similarly, we estimate stride start, expressed in seconds after run-recording began, as tau_i divided by data sampling frequency; to get estimated stride start given in minutes after run-recording began, we additionally divide by \(60\).

We use previously derived information of resting (\((vmc)<0.4\)) and non-resting (\((vmc)\geq 0.4\)) to filter out segmented data parts which likely do not correspond to running/walking. We then plot the estimated stride duration against estimated stride start. We use different background shade to mark the derived resting/non-resting labels for subsequent \(3\)-minute windows.

  • We can see that the estimated stride duration (y-axis) is consistent between data collected at the left ankle and left hip. Some differences we observe may partially result from different smoothing parameters used in segmentation fine-tune procedure; left ankle data was smoothed less (finetune.maxima.ma.W = 0.05) than left hip data (finetune.maxima.ma.W = 0.15).

  • We may suppose that around the resting periods (light blue background), strides of both relatively long and short duration may correspond to the runner walking (making long slow steps, or very short steps). They can be a mislabeled resting data, too.

Results: subject-specific stride pattern (left ankle)

We now use the estimated stride start (tau_i) and stride duration (T_i) to retrieve accelerometry \((vm)\) time-series segments corresponding to segmented strides. We further align them in phase and scale to better observe stride patterns.

  • Based on the plot of aligned and scaled strides (right plot above), it seems there are at least two distinct stride patterns pronounced in data - one with a “spike with a dip” in a middle of a stride pattern phase, and another, less frequent (fewer lines on the plot), with a spike at approx. \(2/3\) of a stride pattern phase.
Correlation clustering of segmented walking strides

We can further use correlation clustering to group segmented walking strides.

We further plot the estimated stride duration time over the course of the run-trial exercise. We mark the pattern cluster assignment with color.

  • From the plot above we can note that pattern cluster assignment corresponds to performed activity: strides marked with red likely correspond to running, strides marked with green - walking.

Segmentation with Approach 2: derive stride templates semi-manually

We now demonstrate a semi-manual way of deriving stride patterns from scratch.

  1. Smooth accelerometry \((vm)\) time-series.

  2. Select a short segment, or multiple segments, from the smoothed \((vm)\) time-series.

  3. Within each selected \((vm)\) segment, use a function to automatically identify all local maxima.

  4. Within each selected \((vm)\) segment, identify a subset of local maxima that corresponds to stride beginnings and ends via visual inspection (the “manual” part).

  5. For each selected \((vm)\) segment, cut it at the points of identified local maxima subset from step 3., interpolate obtained parts to a common vector length, align, standardize to have mean 0 and variance 1, and compute their point-wise average. Standardize the average to have mean 0 and variance 1. For each \((vm)\) segment, the resulted point-wise average is a one newly created template.

Here, we select two \((vm)\) fragments in step 1., each of 6 seconds length, in a way that they correspond to a different pace of running/walking. We apply the above procedure separately for data collected at the left hip and left ankle. That way, for each sensor location, we arrive at two distinct stride templates that we further use in segmentation.

References


  1. Karas, M., Straczkiewicz, M., Fadel, W., Harezlak, J., Crainiceanu, C., Urbanek, J.K. Adaptive empirical pattern transformation (ADEPT) with application to walking stride segmentation, Submitted to Biostatistics, 2018.

  2. Karas, M., Crainiceanu, C., Urbanek, J.: Introduction to adept package vignette to the ‘adept’ package.

  3. Karas, M., Bai, J., Straczkiewicz, M., Harezlak, J., Glynn, N W., Harris, T., Zipunnikov, V., Crainiceanu, C., Urbanek, J.K. Accelerometry data in health research: challenges and opportunities. Review and examples, Statistics in Biosciences, 2018. (link).