1.0) Preparing your data

Hugo Flávio

2020-08-01

Index

  1. Preparing your data
    1. Structuring the study area
    2. Creating a distances matrix
    3. The preload() function
  2. explore()
    1. Processes behind explore()
    2. Inspecting the explore() results
  3. migration()
    1. Processes behind migration()
    2. Inspecting the migration() results
    3. One-way efficiency estimations
  4. residency()
    1. Processes behind residency()
    2. Inspecting the residency() results
    3. Multi-way efficiency estimations
  5. Manual mode
  6. Beyond the three main analyses

Preparing your data

Actel requires that you organise your input files in a specific fashion. Critically, you should have a file named biometrics.csv, one named spatial.csv, and one named deployments.csv (we will look into the file contents in detail soon). Finally, you also need a detections directory, containing all your detection files in .csv format.

To create a blank workspace ready to be used, run createWorkspace(), and a template will be generated automatically in an “actel_workspace” directory. If you would like to, you can change the name of the target directory by using the argument dir. Remember to delete the example rows before including your data!


Note for advanced R users:
You can now run actel analyses using R objects as an input, rather than input files.
Read more about it in this dedicated page.

Biometrics file

Your biometrics file should look similar to this:

Release.date Serial.nr Signal Length.mm Weight.g Group Release.site
2018-02-01 10:05:00 12340001 1 150 40 Wild Site A
2018-02-01 10:10:00 A12-3456-1001 1001 160 60 Hatchery Site A
2018-02-01 10:15:00 19340301 301 170 50 Wild Site B

Although the column order is not important, it is essential that this table contains two columns:

Release.date
Corresponds to the date and time when the fish was released, and must be typed in yyyy-mm-dd hh:mm:ss format. Note that the timestamps must be in the local time zone of the study area, which you will later supply in the tz argument.
Signal
Corresponds to the code emitted by your tags. If you are unsure as to what signals are, you should ask the tag manufacturer more about the differences between code spaces and signals.

The Group and Release.site columns are also important but, if you only have one group or one release site, actel can generate these columns for you. If this happens, you will receive a message:

M: No Release site has been indicated in the biometrics file. Creating a 'Release.site' column to avoid script failure. Filling with 'unspecified'.
M: No 'Group' column found in the biometrics file. Assigning all fish to group 'All'.
Note:
You can append as many columns as you want to biometrics.csv. Importantly, if the biometrics have columns with keywords such as length, weight or mass, graphics showing the distribution of these parameters per fish group will be drawn for you in the report.

Tags with sensors

Tags with single or multiple sensors (and multiple signals) can be listed in the biometrics file. If you have a tag that emits more than one signal, simply list both signals separated with “|” in the “Signal” column. Additionally, you can specify the respective sensor units by creating a new “Sensor.unit” column (only the first letter is capitalised), and including the units being recorded by each signal separated with a “|”. You can name your sensor units in whichever way you prefer, as long as you separate them with a single “|”. If your tag has three or more sensors, simply add the signals and respective sensor units separated with additional “|”’s.

If your tag only has one sensor, list it as usual in the signal column and, if you want to personalise the sensor unit name, include it also in the “Sensor.unit” column.

For example:

Release.date Signal Sensor.unit Length.mm Weight.g Group Release.site
2018-02-01 10:05:00 501 T 150 40 Wild Site A
2018-02-01 10:15:00 502|503 T|D 160 60 Hatchery Site A
2018-02-01 10:15:00 504|505|506 T|D|A 170 50 Wild Site B
Note:
If you do not include a “Sensor.unit” column, actel will simply use whatever sensor unit it can find in the raw detections file. To ensure that no issues arise from different manufacturers handling unit names differently, I recommend that you write the sensor unit even for single sensor tags.

Spatial file

Your spatial file should look similar to this:

Station.name Latitude Longitude Section Array Type
River East 8.411 40.411 River A1 Hydrophone
River West 8.521 40.521 River A1 Hydrophone
Estuary 8.402 40.402 River A3 Hydrophone
Site A 8.442 40.442 A1 Release
Site B 8.442 40.442 A2 Release

In this file you should include both your station deployment sites and your release sites. It is essential that this table has the following columns:

Station.name
The name of the station will be used to match the receiver deployments.
Array
If you are listing a station: The array to which the station belongs.
If you are listing a release site: The first array(s) that the fish is expected to cross after being released.
Section
The study area section to which the hydrophone station belongs. Leave empty for the release sites.
Type
The nature of the item you are listing. You must choose between “Hydrophone” or “Release”.

Note:

  1. The release sites must have exactly the same names in the biometrics table and in the spatial table. If there is a mismatch, actel will stop so you may correct this.

  2. If the spatial table contains release site information, but the biometrics table does not, release site information will be discarded and the following warning is issued:

Warning: At least one release site has been indicated in the spatial file, but no release sites were specified in the biometrics file.
   Discarding release site information to avoid script failure. Please doublecheck your data.

Click here to learn more about how to fit your study area into a spatial.csv file.

Deployments file

The deployments file is where you will list the receivers you used in the study. here is an example:

Receiver Station.name Start Stop
11111 River East 2018-01-01 11:30:00 2018-05-03 09:30:00
22222 River West 2018-01-01 11:33:00 2018-05-04 08:33:00
33333 Estuary 2018-01-01 11:34:00 2018-05-05 12:00:00

Short description of the columns:

Receiver
The serial number of the receiver. If a receiver was retrieved and re-deployed, you should add extra rows for each deployment.
Station.name
The name of the station where the receiver was deployed. It must match one of the station names in the spatial file.
Start and Stop
The times when the receiver was deployed and when the receiver was retrieved, respectively. Must be in a yyyy-mm-dd hh:mm:ss format. Note that these timestamps must be in the local time zone of the study area, which you will later supply in the tz argument.

Detections

Including your detections is the easiest part. Copy the .csv files offloaded from your receivers to the detections folder that was created by the createWorkspace() function and actel will know they are to be imported. Right now actel can upload files generated specifically for actel, or generated by VEMCO and THELMA manufactured receivers. If you have an receiver from a different manufacturer, or if you are using one of the supported manufacturers and get the warning displayed below, please to contact me so we can solve the issue.

Warning: File 'filename' does not match to any of the supported hydrophone file formats! If your file corresponds to a hydrophone log and actel did not recognize it, please get in contact through www.github.com/hugomflavio/actel/issues/new
Note:
The detection timestamps must be in UTC. This is the default time zone used by the receivers, so as long as you did not edit the raw files, all should be ready to go.

Standard detections file

If you are working with a large database, odds are your detection data is no longer in the format of any given manufacturer. To import this data into actel, please prepare a .csv file with the following format:

Timestamp Receiver CodeSpace Signal Sensor.Value Sensor.Unit
2018-01-01 11:30:00 11111 A12-3456 1234 10.0 T
2018-01-01 11:33:00 22222 A12-3456 1235 5.3 D
2018-01-01 11:34:00 33333 R64K 203 13.6 T
Note:
Timestamp data can use a space separator or a “T” separator between the date and time, but should always be in UTC.
Only list the receiver’s serial number in the receiver column (e.g. “13014” rather than “VR2W-13014”, or “111” rather than “TBR-111”).
The column order is irrelevant, but the column names should be an exact match to what is displayed.

Optional: Distances matrix

A distances matrix is a table that contains information on the distance (in metres) between every pair of spatial elements in your study area (i.e. your stations and your release sites). It looks like the table below:

St.1 St.2 St.3 St.4 St.5 St.6 Release
St.1 0 1366 3417 6912 8863 9272 2229
St.2 1366 0 2051 5545 7497 7906 3569
St.3 3417 2051 0 3528 5479 5888 5621
St.4 6912 5545 3528 0 1963 2372 9115
St.5 8863 7497 5479 1963 0 408 11067
St.6 9272 7906 5888 2372 408 0 11476
Release 2229 3569 5621 9115 11067 11476 0

You can learn more about creating distances matrices in this page.

Optional: spatial.txt file

If your study area has multiple branches, then you need to tell actel how your arrays connect with each other. This file is optional because, if omitted, actel will assume that your study area is composed by a single branch and that the array order is the one listed in the spatial.csv file.

The spatial.txt file is very simple; all you need to do is connect your arrays with either dashes or arrows (find out the difference below). You can connect your arrays in pairs or make longer strings. To decide whether or not two arrays should be connected, ask yourself the following: “Can fish move from A to B without having to pass through any other array?” If the answer is yes, then the arrays should be connected. Let’s have a look at some examples:

One branch study area, fully covering arrays

This is the simplest case, e.g. your fish are migrating through a channel, and your arrays split this channel in sections. Like in the figure below:

drawing

For this specific case you do not need to include a spatial.txt file, as long as you order the arrays in the spatial.csv file. However, if you wanted to include a spatial.txt file, it would look like this:

A -- B -- C -- D -- E

Note that you could also specify only one connection per line. E.g.:

A -- B 
B -- C 
C -- D 
D -- E

One branch study area, partly-covering arrays

This case is similar to above, but some arrays do not cover the totality. For example, they may cover a specific area of a lake (e.g. a spawning ground), Like in the figure below:

drawing

In this case, fish can move from B to D without passing through C. As such, a spatial.txt file is now required, and it would look like this:

A -- B -- D -- E
B -- C
C -- D

Note that now, B, C and D are all directly connected with each other.

Multi branch study areas

In this case, the water channel branches into multiple paths, which may then merge again or not, like in the figure below:

drawing

Things are getting a bit more complex now. To make sure you do not miss any connection, it can be useful to draw in a map the links between the arrays as you write them down, so you can check at the end if everything is done. Below, I have marked in purple the connections between all the arrays:

drawing

And here is the respective spatial.txt:

A -- B 
A -- E
A -- F
B -- E
B -- F
E -- F
B -- C
C -- D
C -- E
E -- D
F -- G

Or, in the shortened version:

A -- B -- C -- D
A -- E -- D
A -- F -- G
B -- E -- F
B -- F
C -- E

If are feeling wary of including multiple links in a single line, you can use the function readDot to read your spatial.txt file before starting the analysis and see if everything checks out. E.g.: readDot(input = "spatial.txt")

Note:
This file can be called either “spatial.dot” or “spatial.txt”, and you can create it using a simple text editor such as notepad, kate, VIM, etc.

Array direction

In the migration analysis, array direction is relevant. That means that writing A -- B is not the same as writing B -- A. In the first case, array B comes after array A (i.e. the fish are expected to reach A before reaching B), while in the second case the situation is reversed. This has implications for array efficiency calculations, so be sure to encode your study area carefully.

If you intend to run an explore or residency analysis, array direction is irrelevant.

explore()/residency() vs migration()

An optimal spatial.txt may differ depending on whether you want to run a migration or an explore/residency analysis. This is because actel interprets the spatial.txt file differently depending on the analysis:

In an explore/residency analyses, A -- B should be read as:

It is possible to move from array A to array B and vice-versa

In a migration analysis, A -- B should be read as:

It is possible to move from array A to array B and vice-versa, AND the fish are expected to pass through array A before reaching array B

Let’s have a look at this example.

drawing

In this case, the spatial.txt for a residency analysis would look like this:

A -- B -- C -- G
A -- D -- E -- G
D -- F -- G
D -- B
C -- E
C -- F
E -- F

But a spatial.txt for a migration analysis would look like this:

A -- B -- C -- G
A -- D -- E -- G
D -- F -- G
D -- B -- D
C -- E -- C
C -- F -- C
E -- F -- E

The difference here is that, in the spatial.txt for migration, actel expects arrays to connected in a directional way. That means that connecting D -- B and B -- D does not mean the same thing, and it is important that you explicitly state that the fish are as likely to reach B coming from D as they are to reach D coming from B. In other words, you are saying that neither of the movements would be an explicit backwards movement. This will inform actel that D and B are parallel arrays, and that special rules should apply to them. The same goes for C, E and F.

Barriers

Sometimes it is possible for your fish to go from A to B, but not from B to A. In actel, you can include this information to further refine the analysis. In total, there are three symbols you can use to connect your arrays: --, -> or <-.

If the fish can move from A to B and vice-versa, use A -- B.

If the fish can only move from A to B, use either A -> B or B <- A.

If a fish then actually moves from B to A, actel will try to find an alternative route that the fish could have taken. If there are no alternative routes from B to A, actel will call this situation to your attention.

This has implications for array efficiency calculations, so be sure to encode your study area carefully.

You can find more information about impassable barriers here.

Running the analysis

Now that you have prepared your workspace, it is time to start the analysis. The first thing you need to do is decide which analysis you want to run: explore(), migration() or residency(). Here is some more information on them:

1. explore()

explore() allows you to quickly get a summary of your data. You can use explore() to get a general feel for the study results, and check if the input files are behaving as expected. It is also a good candidate if you just want to validate your detections for later use in other analyses.

Learn more about the explore function.

2. migration()

The migration() analysis runs the same initial checks as explore(), but on top of it, it analyses the fish behaviour. By selecting the arrays that lead to success, you can define whether or not your fish survived the migration. Additional plots help you find out if some fish has been acting odd. Multiple options allow you to tweak the analysis to fit your study perfectly.

Learn more about the migration function.

3. residency()

The residency() analysis runs the same initial checks as explore(), but, similarly to migration, explores particular points of the fish behaviour. If you want to know where your fish were in each day of the study, how many fish were in each section each day, and other residency-focused variables, this is the analysis you are looking for!

Learn more about the residency function.

Back to top.