1. Introduction

The dendroTools R package provides some novel dendroclimatological methods to analyse the relationship between tree-ring and environmental data. In this document we give details about how to use our package. All the data included in examples bellow is a part of the dendroTools R package.

Please note, the examples presented here are made computationally less intensive to satisfy the CRAN policy. You are invited to explore the full potential of our package by using the wide range of possible window widths.

2. The use of daily_response()

The daily_response() works by sliding a moving window through daily environmental data and calculates statistical metrics with one or more tree ring proxies. Possible metrics are correlation coefficient, coefficient of determination (r squared) or adjusted coefficient of determination (adjusted r squared). In addition to linear regression, it is possible to use nonlinear artificial neural network with Bayesian regularization training algorithm (brnn). In general, user can use a fixed or progressive window for calculations of moving averages. To use a fixed window, select its width by assigning an integer to the argument fixed_width. To use many different windows, define the lower_limit and upper_limit arguments. In this case, all window widths between the lower and upper limits will be considered. In the later text, window width representative of a specific day of year (DOY) is defined as the values for this particular day and number of subsequent days corresponding to window width. All calculated metrics are stored in a matrix, which is later used to find the optimal window (i.e. optimal consecutive sequence of days), that returns the highest calculated metric. This selection is later used to calculate temporal stability and cross validation.

2.1 Glimpse daily data

The glimpse_daily_data() is used for visual depiction of daily environmental data. Especially, it is useful to spot missing values and to evaluate the suitability of using specific daily data.

# Load the dendroTools R package
library(dendroTools)

# Load data
data(LJ_daily_temperatures)
glimpse_daily_data(env_data = LJ_daily_temperatures, tidy_env_data = FALSE, na.color = "white")
Figure 1: Glimpse of daily temperautres for meteorological station Ljubljana.

Figure 1: Glimpse of daily temperautres for meteorological station Ljubljana.

2.2 Example of analysing the relationship between mean vessel area (MVA) parameter and daily temperature data with fixed window width

In this example, we analyse the relationship between MVA and daily temperature data for wind width of 60 days (argument fixed_width = 60). Importantly, the row_names_subset argument is set to TRUE. This argument automatically subsets both data frames (i.e., environmental and tree-ring data) and keeps only matching years, which are used for calculations. To use this feature, years must be included as row names. All insignificant correlations are removed by setting the argument remove_insignificant to TRUE. The threshold for significance is set with the alpha argument.

# Load the dendroTools R package
library(dendroTools)

# Load data
data(data_MVA)
data(LJ_daily_temperatures)

# Example with fixed width
example_fixed_width <- daily_response(response = data_MVA, env_data = LJ_daily_temperatures,
                                   method = "cor", fixed_width = 60,
                                   row_names_subset = TRUE, remove_insignificant = TRUE,
                                   alpha = 0.05)
example_fixed_width$plot_extreme
Figure 2: The MVA parameter contains the optimal temperature signal from March 14 (DOY 73) to May 12 (DOY 132).

Figure 2: The MVA parameter contains the optimal temperature signal from March 14 (DOY 73) to May 12 (DOY 132).

2.3 Example of analysing the statistical relationship between mean vessel area (MVA) parameter and daily temperature data for past and present

In the exploration of tree-climate relationships, researchers might be interested in how this relationship is different between past and present. The key argument for such analysis is subset_years, which defines the subset of years to be analysed. In the following example, we will analyse the relationship between MVA and daily temperature data for two periods, 1940 – 1980 and 1981 – 2010. All possible window widths between 50 and 70 days, including the previous year, will be considered. The latter is achieved by setting the previous_year argument to TRUE. Importantly, the row_names_subset argument is set to TRUE. This argument automatically subsets both data frames (i.e., environmental and tree-ring data) and keeps only matching years, which are used for calculations. To use this feature, years must be included as row names. All insignificant correlations are removed by setting the argument remove_insignificant to TRUE. The threshold for significance is set with the alpha argument.

After the function finish with all calculations, we plot the temporal patterns for past and present. In addition, we are interested in specific window width with span of 60 days (use plot_specific_window argument).

# Load the dendroTools R package
library(dendroTools)

# Load the data
data(data_MVA)
data(LJ_daily_temperatures)

# Example for past and present
example_MVA_past <- daily_response(response = data_MVA, env_data = LJ_daily_temperatures,
                              method = "cor", lower_limit = 50, upper_limit = 70,
                              row_names_subset = TRUE, previous_year = TRUE,
                              remove_insignificant = TRUE, alpha = 0.05, 
                              plot_specific_window = 60, subset_years = c(1940, 1980))

example_MVA_present <- daily_response(response = data_MVA, env_data = LJ_daily_temperatures,
                                   method = "cor", lower_limit = 50, upper_limit = 70,
                                   row_names_subset = TRUE, previous_year = TRUE,
                                   remove_insignificant = TRUE, alpha = 0.05, 
                                   plot_specific_window = 60, subset_years = c(1981, 2010))
example_MVA_past$plot_heatmap
Figure 3: The temporal correlations pattern for the past

Figure 3: The temporal correlations pattern for the past

example_MVA_present$plot_heatmap
Figure 4: The temporal correlations pattern for the present

Figure 4: The temporal correlations pattern for the present

example_MVA_past$plot_specific
Figure 5: The temporal correlations for pre-defined window of 60 days for past years

Figure 5: The temporal correlations for pre-defined window of 60 days for past years

example_MVA_present$plot_specific
Figure 6: The temporal correlations for pre-defined window of 60 days for present years

Figure 6: The temporal correlations for pre-defined window of 60 days for present years

The climate-tree relationships between past and present are similar. However, there is a three days of shift in optimal window between past and present.

2.4 Principal component analysis in combination with the daily_response()

Principal Component Analysis (PCA) is commonly used with tree-ring data to reduce the full set of original tree-ring chronologies to a more manageable set of transformed variables. These transformed variables, the set of principal component scores, are then used as predictors in climate reconstruction models. The PCA also acts to strengthen the common regional-scale climate response within a group of tree-ring chronologies by concentrating the common signal in the components with the largest eigenvalues.

To use PCA regression within the daily_response(), set the argument PCA_transformation to TRUE. All variables in the response data frame will be transformed using PCA transformation. If the parameter log_preprocess is set to TRUE, variables will be transformed with logarithmic transformation before used in PCA. With the argument components_selection, we specify how to select PC scores that will be used as predictors. There are three options: “automatic”, “manual” and “plot_selection”. If argument is set to “automatic”, all PC scores with eigenvalues above 1 will be selected. This threshold could be changed by changing the eigenvalues_threshold argument. If argument is set to “manual”, user should set the number of components with N_components argument. If components_selection is set to “plot_selection”, A scree plot will be shown, and a user must manually enter the number of components to be used as predictors. The latter seems to be the most reasonable choice.

In our example, we use PCA on 10 individual Mean Vessel Area (MVA) chronologies (example_proxies_individual). For climate data, LJ_daily_temperatures data is used. The selection of components is set to “manual”, N_components of components to be used in the later analysis is set to 2. All window widths between 60 and 70 days, will be considered. Importantly, the row_names_subset argument is set to TRUE. This argument automatically subsets both data frames (i.e., env_data and response) and keeps only matching years, which are used for calculations. To use this feature, years must be included as row names. All insignificant correlations are removed by setting the argument remove_insignificant to TRUE. The threshold for significance is controlled by the alpha argument.

# Load the dendroTools R package
library(dendroTools)

# Load data
data(example_proxies_individual)
data(LJ_daily_temperatures)

# Example PCA
example_PCA <- daily_response(response = example_proxies_individual, 
                              env_data = LJ_daily_temperatures, method = "lm", 
                              lower_limit = 60, upper_limit = 70,
                              row_names_subset = TRUE, remove_insignificant = TRUE,
                              alpha = 0.01, PCA_transformation = TRUE,
                              components_selection = "manual", N_components = 2)
## Warning in daily_response(response = example_proxies_individual, env_data
## = LJ_daily_temperatures, : Your response data frame has more than 1 column!
## Are you doing a multiproxy research? If so, OK. If not, check your response
## data frame!
# Get the summary statistics for the PCA
summary(example_PCA$PCA_output)
## Importance of components:
##                           Comp.1    Comp.2    Comp.3     Comp.4     Comp.5
## Standard deviation     1.8691406 1.5007125 1.1867201 0.87168877 0.79432916
## Proportion of Variance 0.3493686 0.2252138 0.1408305 0.07598413 0.06309588
## Cumulative Proportion  0.3493686 0.5745824 0.7154129 0.79139704 0.85449292
##                            Comp.6     Comp.7     Comp.8     Comp.9
## Standard deviation     0.68818504 0.61467322 0.57233447 0.49611738
## Proportion of Variance 0.04735987 0.03778232 0.03275667 0.02461325
## Cumulative Proportion  0.90185279 0.93963510 0.97239178 0.99700502
##                            Comp.10
## Standard deviation     0.173060035
## Proportion of Variance 0.002994978
## Cumulative Proportion  1.000000000
example_PCA$plot_heatmap
Figure 7: The temporal pattern for the r.squared. The highest coefficients of determination were calculated for DOY around 90 with span of 2 months.

Figure 7: The temporal pattern for the r.squared. The highest coefficients of determination were calculated for DOY around 90 with span of 2 months.

2.5 Example with negative correlations

The daily_response() recognizes negative relationship (negative correlations) between proxy and daily climate data. Here we show a simple example with negative correlation coefficients. For the demonstration, we use data_TRW_1 and LJ_daily_temperatures.

# Load the dendroTools R package
library(dendroTools)

# Load data
data(data_TRW_1)
data(LJ_daily_temperatures)

# Example negative correlations
data(data_TRW_1)
example_neg_cor <- daily_response(response = data_TRW_1, env_data = LJ_daily_temperatures,
                              method = "cor", lower_limit = 50, upper_limit = 70,
                              row_names_subset = TRUE, remove_insignificant = TRUE,
                              alpha = 0.05)
example_neg_cor$plot_heatmap
Figure 8: The temporal pattern of correlation coefficients.

Figure 8: The temporal pattern of correlation coefficients.

example_neg_cor$plot_extreme
Figure 9: The highest calculated correlation coefficient.

Figure 9: The highest calculated correlation coefficient.

# The temporal stability of correlations
example_neg_cor$temporal_stability
##        Period correlation p value
## 1 1961 - 1986      -0.399  0.0437
## 2 1987 - 2012      -0.653  0.0003

2.6 The example of climate reconstruction with linear model

Reconstructions of past climate conditions include reconstructions of past temperature, precipitation, vegetation, streamflow, sea surface temperature, and other climatic or climate-dependent conditions. With the daily_response(), we first find the optimal sequence of consecutive days that returns the highest metric. In this example, we use data_TRW and daily temperatures from Kredarica (KRE_daily_temperatures). The temporal stability of statistical relationship will be analysed with “progressive” method, using 3 splits (k = 3). Progressive check splits data into k splits, calculates metric for the first split and then progressively adds 1 split at a time and calculates selected metric. The argument cross_validation_type is set to “randomized”, so years will be shuffled before the cross-validation test.

# Load the dendroTools R package
library(dendroTools)

# Load data
data(data_TRW)
data(KRE_daily_temperatures)

example_reconstruction_lin <- daily_response(response = data_TRW, 
                                             env_data = KRE_daily_temperatures, 
                                             method = "lm", metric = "r.squared", 
                                             lower_limit = 30, upper_limit = 40,
                                             row_names_subset = TRUE, 
                                             temporal_stability_check = "progressive",
                                             cross_validation_type = "randomized", k = 3)
example_reconstruction_lin$plot_extreme
Figure 10: The highest calculated coefficient of determination.

Figure 10: The highest calculated coefficient of determination.

The highest r squared was calculated for the period from May 15 to June 27. The aggregated (averaged) daily data for this period is saved in a data frame $optimized_return. Before reconstructing the May 15 to June 27 mean temperature, we should check temporal stability, cross validation and transfer function.

example_reconstruction_lin$temporal_stability
##        Period r.squared p value
## 1 1955 - 1963     0.076      NA
## 2 1955 - 1972     0.225      NA
## 3 1955 - 1981     0.353      NA
example_reconstruction_lin$cross_validation
##   CV      Period       cor      RMSE      RRSE         d         RE
## 1  1 Calibration 0.4238738 0.7865510 0.9057212 0.5383935         NA
## 2  1  Validation 0.7308775 0.9160096 0.7260079 0.7523635  0.4847585
## 3  2 Calibration 0.6737807 0.8361800 0.7389313 0.7776504         NA
## 4  2  Validation 0.2005997 0.8045473 1.0972321 0.4677102 -0.1227808
## 5  3 Calibration 0.6069432 0.8265142 0.7947452 0.7180365         NA
## 6  3  Validation 0.5743455 0.8125381 0.8286392 0.6428229  0.3133811
##           CE         DE
## 1         NA         NA
## 2  0.4729126  0.3900488
## 3         NA         NA
## 4 -0.2039184 -0.2208221
## 5         NA         NA
## 6  0.3133570 -0.1797066
example_reconstruction_lin$transfer_function
Figure 11: Linear transfer function

Figure 11: Linear transfer function

A sample code for climate reconstruction with the lm method:

linear_model <- lm(Optimized_return ~ TRW, data = example_reconstruction_lin$optimized_return)
reconstruction <- data.frame(predictions = predict(linear_model, newdata = data_TRW))
plot(row.names(data_TRW), reconstruction$predictions, type = "l", xlab = "Year", ylab = "Mean temperature May 15 - Jun 27 [ºC]")
Figure 12: The reconstructed average temperature May 15 - June 27 with linear model

Figure 12: The reconstructed average temperature May 15 - June 27 with linear model

2.7 The example of climate reconstruction with nonlinear brnn model

In this example, we use the same data and procedure as in previous one, with the exception of method. Here, we perform a climate reconstruction with the brnn method.

# Load the dendroTools and brnn R package
library(dendroTools)
library(brnn)
## Warning: package 'brnn' was built under R version 3.5.1
## Loading required package: Formula
## Warning: package 'Formula' was built under R version 3.5.2
# Load data
data(data_TRW)
data(KRE_daily_temperatures)

example_reconstruction_brnn <- daily_response(response = data_TRW, 
                                              env_data = KRE_daily_temperatures, 
                                              method = "brnn", metric = "r.squared", 
                                              lower_limit = 43, upper_limit = 44,
                                              row_names_subset = TRUE, 
                                              temporal_stability_check = "progressive",
                                              cross_validation_type = "randomized", k = 3)
example_reconstruction_brnn$plot_extreme
Figure 13: The highest calculated coefficient of determination for the brnn model.

Figure 13: The highest calculated coefficient of determination for the brnn model.

The results for the brnn model are the same as for the linear model. The highest r squared was calculated for the period from May 15 to June 27. The aggregated (averaged) daily data for this period is saved in a data frame $optimized_return. Before reconstructing the May 15 to June 27 mean temperature, we should check temporal stability, cross validation and transfer function.

example_reconstruction_brnn$temporal_stability
##        Period r.squared p value
## 1 1955 - 1963    -0.013      NA
## 2 1955 - 1972     0.203      NA
## 3 1955 - 1981     0.348      NA
example_reconstruction_brnn$cross_validation
##   CV      Period       cor      RMSE      RRSE         d         RE
## 1  1 Calibration 0.6975478 0.7451396 0.7194525 0.7859123         NA
## 2  1  Validation 0.1025902 0.9201386 1.1249580 0.4019135 0.05536934
## 3  2 Calibration 0.5150708 0.7671952 0.8638733 0.5699447         NA
## 4  2  Validation 0.7045370 0.9506308 0.8067756 0.5947517 0.34916817
## 5  3 Calibration 0.5530828 0.8620919 0.8392695 0.6255399         NA
## 6  3  Validation 0.6280176 0.7246750 0.8594632 0.7038090 0.43409402
##           CE         DE
## 1         NA         NA
## 2 -0.2655305 -1.4336232
## 3         NA         NA
## 4  0.3491131  0.3489657
## 5         NA         NA
## 6  0.2613230  0.2504679
example_reconstruction_brnn$transfer_function
Figure 14: Nonlinear brnn transfer function

Figure 14: Nonlinear brnn transfer function

A sample code for climate reconstruction with the brnn method:

brnn_model <- brnn(Optimized_return ~ TRW, data = example_reconstruction_brnn$optimized_return)
## Number of parameters (weights and biases) to estimate: 6 
## Nguyen-Widrow method
## Scaling factor= 1.4 
## gamma= 1.9556     alpha= 0.9443   beta= 3.749
reconstruction <- data.frame(predictions = predict(brnn_model, newdata = data_TRW))
plot(row.names(data_TRW),reconstruction$predictions, type = "l", xlab = "Year",
     ylab = "Mean temperature May 15 - Jun 27 [ºC]")
Figure 15: The reconstructed average temperature May 15 - June 27 with the nonlinear brnn model

Figure 15: The reconstructed average temperature May 15 - June 27 with the nonlinear brnn model

2.8 Example of using daily_response() with daily precipitation and daily temperature data

The current version of the daily_response() does not allow for using more than 1 daily environmental data, but the analysis with, e.g., daily temperature and precipitation data could be done separately and visualised together. In the following example, we analyse the relationship between Mean Vessel Area (MVA) chronology and daily temperature and precipitation data for the period 1960 – 2010. The period is defined with the subset_ years argument.

The daily precipitation data is in tidy form, i.e. data frame with three columns representing “Year”, “DOY” and “Precipitation”. When tidy data is used, the argument tidy_env_data has to be set to TRUE. Additionally, to aggregate daily precipitation data, it makes more sense to use sum instead of mean. To do so, set the argument aggregate_function to ‘sum’.

# Load the dendroTools and brnn R package
library(dendroTools)

# Load data
data(data_MVA)
data(LJ_daily_temperatures)
data(LJ_daily_precipitation)

# Example with precipitation and temperatures
example_MVA_TEMP <- daily_response(response = data_MVA, env_data = LJ_daily_temperatures,
                                   method = "cor", lower_limit = 50, upper_limit = 70,
                                   row_names_subset = TRUE, previous_year = FALSE,
                                   remove_insignificant = TRUE, alpha = 0.05, 
                                   tidy_env_data = FALSE, subset_years = c(1960, 2010))

example_MVA_PREC <- daily_response(response = data_MVA, env_data = LJ_daily_precipitation,
                                   method = "cor", lower_limit = 50, upper_limit = 70,
                                   row_names_subset = TRUE, previous_year = FALSE,
                                   remove_insignificant = TRUE, alpha = 0.05, 
                                   tidy_env_data = TRUE, subset_years = c(1960, 2010), 
                                   aggregate_function = "sum")
library(gridExtra)
grid.arrange(example_MVA_TEMP$plot_heatmap, example_MVA_PREC$plot_heatmap)
Figure 16: The temporal pattern of correlatioins for temperatures (upper plot) and precipitation (lower plot)

Figure 16: The temporal pattern of correlatioins for temperatures (upper plot) and precipitation (lower plot)

grid.arrange(example_MVA_TEMP$plot_extreme, example_MVA_PREC$plot_extreme)
Figure 17: The highest calculated correlations for temperatures (upper plot) and precipitation (lower plot)

Figure 17: The highest calculated correlations for temperatures (upper plot) and precipitation (lower plot)

2.9 Example of multiproxy analysis

The daily_response() enables the analysis of the relationship between daily environmental data and more than one response (proxy) variable, simultaneously. However, users should select multiple proxies reasonably and with caution, since there is nothing to prevent from including colinear variables. Using several proxies will result in higher explained variance but at the cost of degrees of freedom. In those cases, users should check the temporal stability of relationship and the cross-validation result. If metrics on validation data are much lower than on calibration data, there is a problem of overfitting and users should exclude some proxies and repeat the analysis. Importantly, when using the multiproxy approach, use the adjusted r squared metric!

# Load the dendroTools and brnn R package
library(dendroTools)

# Example of multiproxy analysis
data(example_proxies_1)
data(LJ_daily_temperatures)

# Summary of the example_proxies_1 data frame
summary(example_proxies_1)
##       MVA               O18                TRW        
##  Min.   :0.03812   Min.   :-2.12315   Min.   :0.7851  
##  1st Qu.:0.04204   1st Qu.:-0.54584   1st Qu.:0.8918  
##  Median :0.04482   Median :-0.04609   Median :1.0049  
##  Mean   :0.04532   Mean   : 0.04739   Mean   :1.0197  
##  3rd Qu.:0.04752   3rd Qu.: 0.48215   3rd Qu.:1.1159  
##  Max.   :0.05608   Max.   : 2.66724   Max.   :1.3782

There are three proxy variables: Tree-ring width (TRW), stable oxygen isotope ratio (O18) and Mean Vessel Area (MVA).

cor(example_proxies_1)
##           MVA       O18       TRW
## MVA 1.0000000 0.2114934 0.0342514
## O18 0.2114934 1.0000000 0.2702639
## TRW 0.0342514 0.2702639 1.0000000

The correlation matrix shows only low correlations among the three proxies.

example_multiproxy <- daily_response(response = example_proxies_1, 
                                     env_data = LJ_daily_temperatures, 
                                     method = "lm", metric = "adj.r.squared", 
                                     lower_limit = 60, upper_limit = 70, 
                                     row_names_subset = TRUE, previous_year = FALSE, 
                                     remove_insignificant = TRUE, alpha = 0.05)
## Warning in daily_response(response = example_proxies_1, env_data =
## LJ_daily_temperatures, : Your response data frame has more than 1 column!
## Are you doing a multiproxy research? If so, OK. If not, check your response
## data frame!
example_multiproxy$plot_heatmap
Figure 18: The temporal pattern of r squared for the multiproxy example)

Figure 18: The temporal pattern of r squared for the multiproxy example)

The temporal pattern shows similar result than for the example with MVA only. Therefore, the highest percentage of explained variance is probable due to the MVA variable. TRW and O18 are probable not significant variables. Let’s check this out.

liner_model <- lm(Optimized_return ~ ., data = example_multiproxy$optimized_return)
summary(liner_model)
## 
## Call:
## lm(formula = Optimized_return ~ ., data = example_multiproxy$optimized_return)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.68389 -0.48430  0.01942  0.50162  2.10595 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.0844     1.5262   0.711    0.481    
## MVA         217.4850    27.0948   8.027 1.32e-10 ***
## O18           0.0762     0.1303   0.585    0.561    
## TRW           1.3799     0.8560   1.612    0.113    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8507 on 51 degrees of freedom
## Multiple R-squared:  0.5931, Adjusted R-squared:  0.5692 
## F-statistic: 24.78 on 3 and 51 DF,  p-value: 4.964e-10

For the multiproxy approach, there is no transfer function created.

example_multiproxy$transfer_function
## [1] "No transfer function is created for two or more response variables"