Introduction to gratis

Bocong Zhao

About gratis

The gratis package indicates generating time series with diverse and controllable characteristic. It is a new efficient and general approach, based on gaussian mixture autoregressive (MAR) models to generate a wide range of non-gaussian and nonlinear time series.

Our generated dataset can be used as diversifiable and controllable benchmarking data in the time series domain. And it can apply as an algorithm evaluation tool for tasks such as time series forecasting and classification with a minimal input of human efforts and computational resources.

Introduction of gratis mechanism

Based on simulate time series data with mixture autoregressive model, gratis can coverage generalise time series and investigate the diversity in a time series feature space.

Furthermore, by tuning parameters of mixture autoregressive model, gratis can also efficiently generate new time series and controllable features.

# load package
library(gratis)

Generate diverse time series

We use function generate_ts() to generate diverse time series

Our generation process use distributions instead of fixed parameter values in underlying models to allow generate diverse time series instances. The diversity of the generated time series should not rely on the parameter settings.

Definitions

Here are the definitions of parameter settings in function generate_ts():

parameter settings Definition
n.ts number of time series to be generated
freq seasonal period of the time series to be generated
nComp number of mixing components when simulating time series using MAR models
n length of the generated time series

Example

Suppose we want to use MAR model to generate 3 time series from random parameter spaces. Each time series has 12 seasonal periods, 2 mixing components and the length 120.

# Generate diverse time series
x <- generate_ts(n.ts = 3, freq = 12, nComp = 2, n = 120)

Output

We can see 3 different time series be simulated, which are N1, N2 and N3. In this example we use time series N1 for further analysis.

As required, there are 2 mixing components when simulating time series using MAR models, which are pars1 and pars2

Each component stands for different weight.

# N1 time series
x$N1$pars
#> $pars1
#> [1]  0.49753913 -0.06801653 -0.51451890  0.18994534
#> 
#> $pars2
#> [1]  1.103459646 -0.004361339 -0.770376614  0.434467342
#> 
#> $weights
#> [1] 0.6804942 0.3195058

Plot time series

# plot N1 time series
autoplot(x$N1$x)

Generate mutiple seasonal time series

Time series can exhibit multiple seasonal pattern of different length, especially when series observed at a high frequency such as daily or hourly data.

We use function generate_msts() to generate mutiple seasonal time series.

Definitions

Here are the definitions of parameter settings in function generate_msts():

parameter settings Definition
seasonal.periods a vector of seasonal periods of the time series to be generated
nComp number of mixing components when simulating time series using MAR models
n length of the generated time series

Example

Suppose we want to use MAR model to generate a time series with 2 mixing components and the length 800 from random parameter spaces. Particularly, this time series has two seasonal periods 7 and 365.

# Generate mutiple seasonal time series
x <- generate_msts(seasonal.periods = c(7, 365), n = 800, nComp = 2)

Plot time series

autoplot(x)

Generate time series with controllable features

Time series analysis with particular focus may only interested in a certain area of feature space or a subset of features.

Our function generate_ts_with_target() can efficiently generate time series with target features.

The principle behind is that we use genetic algorithms to tune MAR parameters until the distance between target feature vector and feature vector of a sample of time series simulated from MAR is approximately equal to 0.

Definitions

Here are the definitions of parameter settings in function generate_ts_with_target ():

parameter settings Definition
n number of time series to be generated
ts.length length of the time series to be generated
freq frequency of the time series to be generated
seasonal 0 for non-seasonal data, 1 for single-seasonal data, and 2 for multiple seasonal data
features a vector of function names
selected.features selected features to be controlled
target target feature values
parallel An optional argument which allows to specify if the Genetic Algorithm should be run sequentially or in parallel

Example

Suppose we want to use MAR model to generate 1 non-seasonal data time series with frequency 1 and the length 60. Particularly, this time series has two selected features, entropy and trend with target value between 0.6 to 0.9

x <- generate_ts_with_target(
  n = 1, ts.length = 60, freq = 1, seasonal = 0,
                        features = c('entropy', 'stl_features'),
                      selected.features = c('entropy', 'trend'),
                        target = c(0.6, 0.9),  
                        parallel=FALSE
                        )
#> GA | iter = 1 | Mean = -13.56082680 | Best =  -0.05842758

Plot time series

autoplot(x)