litteR is a user-friendly tool for analyzing litter data (e.g., beach litter data). The current version (0.8.1) contains routines for:
The focus of this version of litteR is to provide a a user-friendly, flexible, robust, transparent, and relatively simple tool for litter analysis. Although litteR is distributed as an R-package, experience with R is not required. If you need more information on how to install R, RStudio, and litteR, please consult our installation guide.
Litter data are count data. As has been illustrated in the histogram below (copied with permission from Hanke et al., 2019), litter data generally have skewed distributions. All procedures in litteR are basesd on robust statistical methods. They do not require distributional assumptions and are relatively robust for outliers.
This user guide consists of two parts. In the first part, the user interface is described, the second part provides details on the technicalities.
For applications with (a previous version of) litteR see Schulz et al. (2019). litter is the successor of the Litter Analyst software (Schulz et al., 2017).
Before litteR can be used, it should be installed or updated in case you installed litteR before. See our installation guide fore details.
You need to install litteR only once, but you need to load this package each time you start RStudio.
The litteR-package should be loaded in RStudio before you can use it. This can be done by running the following code in the R-console or the RStudio-console:
library(litteR)
A startup messsage appears that gives some essential instructions to start using litteR.
The easiest way to start working with litteR is to create an empty project directory. This directory can be filled with example and reference files by running:
create_litter_project("d:/work/litter-projects/beach-litter")
in the RStudio-console. For more information on how to obtain and use RStudio, consult its website or read our installation guide.
The argument of function create_litter_project
(i.e., the quoted part in parentheses) is an existing work directory on your computer. This can be any valid directory name with sufficient user privileges. Note for MS-Windows users: R requires forward slashes!
It is also possible to run create_litter_project()
without an argument. In that case, a simple graphical user interface pops up for interactive directory selection.
litteR can be started typing litter()
in the RStudio console (see the figure below).
After entering litter()
, a simple graphical user interface pops up for file selection. An example of a file selection dialogue is given below.
litteR needs three input files:
These input files are described below.
The type file contains a list of all litter types that are allowed to use in the data file. It also indicates to which litter group each litter type belongs. Two example files, named ‘types-ospar.csv
’ and ‘types-ospar-tc-sup-fish-plastic.csv
’ are automatically generated when using the create_litter_project
-function, a described earlier in this tutorial. A type file assigns each litter type (type_name
) to one or more litter groups. The first 10 rows of ’types-ospar-tc-sup-fish-plastic.csv
are given in the table below.
type_name | included | SUP | FISH | PLASTIC |
---|---|---|---|---|
Plastic: Yokes [1] | x | x | x | |
Plastic: Bags [2] | x | x | x | |
Plastic: Small_bags [3] | x | x | x | |
Plastic: Bag_ends [112] | x | x | x | |
Plastic: Drinks [4] | x | x | x | |
Plastic: Cleaner [5] | x | x | x | |
Plastic: Food [6] | x | x | x | |
Plastic: Toiletries [7] | x | x | x | |
Plastic: Oil_small [8] | x | x | ||
Plastic: Oil_large [9] | x | x |
The following columns are in this table:
type_name
. This column is required and gives all litter types that are allowed in the data file. Litter types given in this column need to be unique;included
: This column indicates whether a type specified in column type_name
will be used in the analysis or not. Only type_names
that are included in the analysis will contribute to the total litter count (TC).SUP
, FISH
, PLASTIC
, etc.: these columns give the definition of each litter group. In the example above three groups are given: ‘single use plastics’ (SUP), ‘fisheries related litter’ (FISH), and ‘plastics’ (PLASTIC). A cross (x) indicates that a litter type in type_name
is a member of a litter group or not. A cross (x) means ‘a member’, an empty cell means ‘not a member’.The user may use one of the provided type files as a template for his own type file. litteR will use the type file that has been specified in the settings-file.
litteR supports a simple and flexible data format. It is similar to the OSPAR-format. The data are stored in so called wide format: each row refers to a single survey, each column to a single litter type or metadata. The table below gives an example of (a small) part of a data file.
spatial_code | date | country_code | Plastic: Bags [2] | Plastic: Small_bags [3] |
---|---|---|---|---|
Bergen | 2012-01-27 | NL | 3 | 9 |
Bergen | 2012-04-20 | NL | 8 | 12 |
Bergen | 2012-07-22 | NL | 1 | 5 |
Bergen | 2012-10-19 | NL | 2 | 4 |
Bergen | 2013-02-19 | NL | 24 | 23 |
Bergen | 2013-04-11 | NL | 0 | 9 |
Bergen | 2013-07-20 | NL | 10 | 4 |
Bergen | 2013-10-16 | NL | 7 | 5 |
Bergen | 2014-01-08 | NL | 9 | 20 |
Bergen | 2014-04-23 | NL | 10 | 29 |
The columns spatial_code
and date
are required. The remaning codes are either litter types or optional metadata. Each litter type should also be available in the litter type file. Only litter types in the litter type file are valid. Column names that are neither valid litter types nor spatial_code
or date
are considered as optional metadata columns. These columns do not affect the results. In the example above, country codes have been added as metadata.
The column spatial_code
gives the spatial aggregation level. All analysis results will be presented on this spatial aggregation level. In the example above, the spatial aggregation level is the beach where the litter surveys took place. However, higher spatial aggregation levels such as the country level, or regional levels can also be used. The date
column gives the monitoring date in ISO format, i.e., YYYY-mm-dd (for example 2020-07-03, to indicate 3 July 2020). For convenience, also the OSPAR-format (dd/mm/YYYY) is currently supported (for example 03/07/2020, to indicate 3 July 2020).
The third column country_code
, in the type file given above, is not required and will not be used in the analysis. The remaining columns Plastic: Bags [2]
, Plastic: Small_bags [3]
, etc. contain the counts for each litter type that are selected in the litter type file.
The settings file contains all settings needed to run litteR. An example of the contents of a settings file is given in the figure below.
# litteR settings file
# Period to analyse (YYYY-mm-dd)
date_min: 2012-01-01
date_max: 2017-12-31
# Percentage of total count to analyse (0 < percentage_total_count <= 100)
percentage_total_count: 80
# Data file. Note: the datafile must be in the same path as the settings file
file_data: beach-litter-nl-2012-2017.csv
# Type file. Defines the types and their groups
file_types: types-ospar.csv
# Select trend figures to plot in the report
spatial_code: ["Noordwijk", "Terschelling"]
group_code: ["TC", "SUP", "FISH"]
type_name: ["Plastic: Bags [2]"]
figure_quality: low
The settings-file contains the following entries:
date_min
and date_max
, the first and final date of the period to analyze. Dates should be given in ISO format, i.e., YYYY-mm-dd (for example 2020-07-03, to indicate 3 July 2020);percentage_total_count
: the percentage of the total count used to estimate statistics. See the section on descriptive statistics for more information;file_data
: name of the data file (including its path, e.g., c:/my-litter-directory/my-litter-data.csv);file_types
: name of the type file (including its path, e.g., c:/my-litter-directory/types-ospar.csv);spatial_code
: name(s) of location(s) to plot. Spatial codes should be available in column spatial_code
in the data file;group_code
: name(s) of group(s) to plot. Litter groups should be available as column names in the type file;type_name
: name(s) of type(s) to plot; Type names should be available in the type file and data file;figure_quality
: quality of the plots in the report, either high
or low
.All input files are validated by litteR. The following validation rules apply:
litteR produces three output files:
For convenience, all input and output files are stored as a snapshot in a directory with names like litteR-results-20200602T141829
, where the final part of the name is a timestamp.
litteR produces an HTML-report that can best be viewed with modern web browsers like Mozilla FireFox, Google Chrome, or Safari. These browsers are freely available from the internet.
The filename of each report starts with ‘litter-results’, followed by a timestamp: YYYYmmddTHHMMSS and the extension html. For example: litteR-results-20200602T141829.html
This section briefly describes each section in the HTML-report
This section gives a summary of the settings in the settings file.
In this section (potential) problems in the input files are reported. These problems are also stored in the log file.
For each spatial_code
in the data file, adjusted boxplots are given of the total count for the detection of outliers. Outliers are given as dots (if any) in adjusted box-and-whisker plots. Adjusted boxplots are more suitable for outlier detection in case of skewed distributions than traditional box plots. An example of these box-and-whisker plots are given below.
For each spatial_code
and group/type name, the following statistics are estimated:
These statistics will be estimated for all litter types with the greatest counts making up a percentage of the total count and for all litter groups. This percentage is given as percentage_total_count
in the settings file.
The descriptive statistics for the litter types and groups are stored in a CSV-file with a name starting with litteR-results
and ending with a timestamp. The statistics for litter groups are also printed as a table and shown as bar plots in the report: one plot for each spatial aggregation level as defined in the spatial_code
column of the data file. An example is given in the figure below. If you want other groups, or only a subset of groups, you should modify the type file
For each spatial code, and the type names and group codes specified in the settings file, trends are estimated by means of the Theil-Sen slope estimator: a robust non-parametric estimator of slope (counts / year). The significance of the estimated slopes is tested by means of the Mann-Kendall test. The Mann-Kendall test is a non-parametric test and as such does not make distributional assumptions on the data.
The figure below gives examples of trend plots for total count (TC), single use plastics (SUP), and plastic bags at the beach of Terschelling (The Netherlands). In each plot, the black dots are the observations, the thin gray line segments connect the dots and guide the eye, and the red line is the Theil-Sen slope.
In addition to a report, a CSV-file with descriptive statistics for each spatial code is produced. An example of such a table is given below. See Section descriptive statistics for more details.
spatial_code | from | to | type_name | %TC | mean | median | cv | rmad | n | slope | p_value |
---|---|---|---|---|---|---|---|---|---|---|---|
Bergen | 2012-01-27 | 2017-10-11 | TC | 100.00 | 377.0 | 302.0 | 0.731 | 0.779 | 24 | 40.000 | 0.1230 |
Bergen | 2012-01-27 | 2017-10-11 | PLASTIC | 88.80 | 341.0 | 270.0 | 0.757 | 0.847 | 24 | 35.600 | 0.1130 |
Bergen | 2012-01-27 | 2017-10-11 | FISH | 41.40 | 162.0 | 104.0 | 0.862 | 0.951 | 24 | 13.000 | 0.1230 |
Bergen | 2012-01-27 | 2017-10-11 | plastic: string [32] | 27.90 | 120.0 | 78.0 | 0.967 | 1.250 | 24 | 17.000 | 0.0370 |
Bergen | 2012-01-27 | 2017-10-11 | SUP | 25.40 | 91.4 | 73.0 | 0.728 | 0.782 | 24 | 5.370 | 0.2230 |
Bergen | 2012-01-27 | 2017-10-11 | plastic: plastic_small [117] | 9.47 | 42.5 | 21.5 | 1.170 | 1.100 | 24 | 3.120 | 0.2130 |
Bergen | 2012-01-27 | 2017-10-11 | plastic: plastic_large [46] | 8.04 | 24.7 | 17.5 | 0.785 | 0.508 | 24 | 0.660 | 0.4210 |
Bergen | 2012-01-27 | 2017-10-11 | plastic: fishing_net_small [115] | 7.53 | 23.2 | 4.0 | 1.540 | 1.480 | 24 | -2.280 | 0.0395 |
Bergen | 2012-01-27 | 2017-10-11 | plastic: caps [15] | 4.98 | 20.4 | 16.0 | 1.030 | 0.741 | 24 | 2.310 | 0.0940 |
Bergen | 2012-01-27 | 2017-10-11 | RUBBER | 4.75 | 15.0 | 13.5 | 0.738 | 0.659 | 24 | 0.941 | 0.2060 |
litteR’s log-file is very helpful to understand warnings and error messages. The log-file stores the description of all data analysis steps in chronological order. Part of a log-file is given below.
2020-06-02 14:18:29 [INFO] Starting a new litteR session
2020-06-02 14:18:29 [INFO] litteR version: 0.8.0
2020-06-02 14:18:29 [INFO] litteR release date: 2020-01-31
2020-06-02 14:18:29 [INFO] Reading settings file ‘settings.yaml’
2020-06-02 14:18:29 [INFO] Check optional settings...
2020-06-02 14:18:29 [INFO] Check existence of required settings...
2020-06-02 14:18:29 [INFO] All required settings are available
2020-06-02 14:18:29 [INFO] Checking settings 'date_min' and 'date_max'
2020-06-02 14:18:29 [INFO] Settings 'date_min' and 'date_max' are valid
2020-06-02 14:18:29 [INFO] Checking setting 'percentage_total_count'
2020-06-02 14:18:29 [INFO] Setting 'percentage_total_count' is valid
2020-06-02 14:18:29 [INFO] Checking setting 'figure_quality'
2020-06-02 14:18:29 [INFO] Setting 'figure_quality' is valid
2020-06-02 14:18:29 [INFO] Settings file has been read
2020-06-02 14:18:29 [INFO] Constructing filename for report
2020-06-02 14:18:29 [INFO] Filename ‘litteR-results-20200602T141829.html’ created
2020-06-02 14:18:29 [INFO] Construct filename for storing statistics
2020-06-02 14:18:29 [INFO] Filename ‘litteR-results-20200602T141829.csv’ created
2020-06-02 14:18:29 [INFO] Starting litter analysis
2020-06-02 14:18:29 [INFO] Checking parameters in settings file
Each line contains a single log-event and always has the following format:
INFO
for informative messages, WARN
for warnings, ERROR
for errors;The runtime error messages and the log file should provide you with clear information about errors in the data file and settings, and about warnings (points of attention). For additional information you can consult the points below.
litter()
in the RStudio-console, a file dialogue should appear. If that is not the case, the file dialogue is probably covered by RStudio (see the task manager or use ALT-TAB on MS-Windows to navigate to the hidden file dialogue);invalid multibyte string
, there is a character in your input file that is not part of the English alphabet. Substituting this character by a valid character in the range A-Z or a-z usually solves this problem.Hanke G., Walvoort D., van Loon W., Addamo A.M., Brosich A., del Mar Chaves Montero M., Molina Jack M.E., Vinci M., Giorgetti A., EU Marine Beach Litter Baselines, EUR 30022 EN, Publications Office of the European Union, Luxemburg, 2019, ISBN 978-92-76-14243-0, doi: 10.2760/16903, JRC114129.
Schulz, M., van Loon, W., Fleet, D. M., Baggelaar, P., & van der Meulen, E. (2017). OSPAR standard method and software for statistical analysis of beach litter data. Marine pollution bulletin, 122(1-2), 166-175.
Schulz, Marcus, Dennis J.J. Walvoort, Jon Barry, David M. Fleet, Willem M.G.M. van Loon, 2019. Baseline and power analyses for the assessment of beach litter reductions in the European OSPAR region. Environmental Pollution 248:555-564. https://doi.org/10.1016/j.envpol.2019.02.030