The gratis package indicates generating time series with diverse and controllable characteristic. It is a new efficient and general approach, based on gaussian mixture autoregressive (MAR) models to generate a wide range of non-gaussian and nonlinear time series.
Our generated dataset can be used as diversifiable and controllable benchmarking data in the time series domain. And it can apply as an algorithm evaluation tool for tasks such as time series forecasting and classification with a minimal input of human efforts and computational resources.
Based on simulate time series data with mixture autoregressive model, gratis can coverage generalise time series and investigate the diversity in a time series feature space.
Furthermore, by tuning parameters of mixture autoregressive model, gratis can also efficiently generate new time series and controllable features.
# load package
library(gratis)
We use function generate_ts() to generate diverse time series
Our generation process use distributions instead of fixed parameter values in underlying models to allow generate diverse time series instances. The diversity of the generated time series should not rely on the parameter settings.
Definitions
Here are the definitions of parameter settings in function generate_ts():
parameter settings | Definition |
---|---|
n.ts | number of time series to be generated |
freq | seasonal period of the time series to be generated |
nComp | number of mixing components when simulating time series using MAR models |
n | length of the generated time series |
Example
Suppose we want to use MAR model to generate 3 time series from random parameter spaces. Each time series has 12 seasonal periods, 2 mixing components and the length 120.
# Generate diverse time series
generate_ts(n.ts = 3, freq = 12, nComp = 2, n = 120) x <-
Output
We can see 3 different time series be simulated, which are N1, N2 and N3. In this example we use time series N1 for further analysis.
As required, there are 2 mixing components when simulating time series using MAR models, which are pars1 and pars2
Each component stands for different weight.
# N1 time series
$N1$pars x
#> $pars1
#> [1] 0.49753913 -0.06801653 -0.51451890 0.18994534
#>
#> $pars2
#> [1] 1.103459646 -0.004361339 -0.770376614 0.434467342
#>
#> $weights
#> [1] 0.6804942 0.3195058
Plot time series
# plot N1 time series
autoplot(x$N1$x)
Time series can exhibit multiple seasonal pattern of different length, especially when series observed at a high frequency such as daily or hourly data.
We use function generate_msts() to generate mutiple seasonal time series.
Definitions
Here are the definitions of parameter settings in function generate_msts():
parameter settings | Definition |
---|---|
seasonal.periods | a vector of seasonal periods of the time series to be generated |
nComp | number of mixing components when simulating time series using MAR models |
n | length of the generated time series |
Example
Suppose we want to use MAR model to generate a time series with 2 mixing components and the length 800 from random parameter spaces. Particularly, this time series has two seasonal periods 7 and 365.
# Generate mutiple seasonal time series
generate_msts(seasonal.periods = c(7, 365), n = 800, nComp = 2) x <-
Plot time series
autoplot(x)
Time series analysis with particular focus may only interested in a certain area of feature space or a subset of features.
Our function generate_ts_with_target() can efficiently generate time series with target features.
The principle behind is that we use genetic algorithms to tune MAR parameters until the distance between target feature vector and feature vector of a sample of time series simulated from MAR is approximately equal to 0.
Definitions
Here are the definitions of parameter settings in function generate_ts_with_target ():
parameter settings | Definition |
---|---|
n | number of time series to be generated |
ts.length | length of the time series to be generated |
freq | frequency of the time series to be generated |
seasonal | 0 for non-seasonal data, 1 for single-seasonal data, and 2 for multiple seasonal data |
features | a vector of function names |
selected.features | selected features to be controlled |
target | target feature values |
parallel | An optional argument which allows to specify if the Genetic Algorithm should be run sequentially or in parallel |
Example
Suppose we want to use MAR model to generate 1 non-seasonal data time series with frequency 1 and the length 60. Particularly, this time series has two selected features, entropy and trend with target value between 0.6 to 0.9
generate_ts_with_target(
x <-n = 1, ts.length = 60, freq = 1, seasonal = 0,
features = c('entropy', 'stl_features'),
selected.features = c('entropy', 'trend'),
target = c(0.6, 0.9),
parallel=FALSE
)
#> GA | iter = 1 | Mean = -13.56082680 | Best = -0.05842758
Plot time series
autoplot(x)