Commit 1e39f3e9 authored by Alex Nunes's avatar Alex Nunes
Browse files

Fixes for REI calculation, docs, and tests

 Changes to be committed:
	modified:   docs/index.rst
	modified:   docs/notebooks/interval_data.ipynb.rst
	modified:   docs/notebooks/receiver_efficiency_index.ipynb.rst
	modified:   py_notebooks/interval_data.ipynb
	modified:   py_notebooks/receiver_efficiency_index.ipynb
	modified:   resonate/filters.py
	modified:   resonate/receiver_efficiency.py
	modified:   tests/assertion_files/nsbs_interval.csv
	modified:   tests/interval_test.py
parent b5b00132
Loading
Loading
Loading
Loading
+1 −1
Original line number Diff line number Diff line
@@ -101,7 +101,7 @@ This residence index tool will take a compressed or uncompressed detection file
Receiver Efficiency Index
-------------------------

*(Ellis, R., Flaherty-Walia, K., Collins, A., Bickford, J., Walters Burnsed,  Lowerre-Barbieri S. 2018. Acoustic telemetry array evolution: from species- and project-specific designs to large-scale, multispecies, cooperative networks)*
`(Ellis, R., Flaherty-Walia, K., Collins, A., Bickford, J., Walters Burnsed,  Lowerre-Barbieri S. 2018. Acoustic telemetry array evolution: from species- and project-specific designs to large-scale, multispecies, cooperative networks) <https://doi.org/10.1016/j.fishres.2018.09.015>`_

The receiver efficiency index is number between ``0`` and ``1`` indicating the amount of relative activity at each receiver compared to the entire set of receivers, regardless of positioning. The function takes a set detections and a deployment history of the receivers to create a context for the detections. Both the amount of unique tags and number of species are taken into consideration in the calculation. For the exact method, see the details in :ref:`Receiver Efficiency Index<receiver_efficiency_index_page>`.

+2 −2
Original line number Diff line number Diff line
@@ -47,11 +47,11 @@ You can modify individual stations if needed by using

.. code:: python

    station_name = 'station'
    station_name = 'HFX001'
    
    station_detection_radius = 500
    
    station_det_radius.set_value(station_name, 'radius', geopy.distance.Distance( station_detection_radius/1000.0 ))
    station_det_radius.at[station_name, 'radius'] = geopy.distance.Distance( station_detection_radius/1000.0 )

Create the interval data by passing the compressed detections, the
matrix, and the station radii.
+2 −1
Original line number Diff line number Diff line
@@ -16,7 +16,7 @@ formula of:
.. container:: large-math

   REI =
   :math:`\frac{\left(\frac{T_r}{T_a} \times \frac{S_r}{S_a}\right) / \left(\frac{DD_a}{DD_r}\right)}{D_r}`
   :math:`\frac{T_r}{T_a} \times \frac{S_r}{S_a} \times \frac{DD_r}{DD_a} \times \frac{D_a}{D_r}`

.. raw:: html

@@ -31,6 +31,7 @@ formula of:
   receivers
-  :math:`DD_r` = The number of unique days with detections on the
   receiver
-  :math:`D_a` = The number of days the array was active
-  :math:`D_r` = The number of days the receiver was active

Each REI is then normalized against the sum of all considered stations.
+2 −2
Original line number Diff line number Diff line
%% Cell type:markdown id: tags:

# Interval Data

<hr>

``interval_data()`` takes a compressed detections DataFrame, a distance matrix, and a detection radius DataFrame and
creates an interval data DataFrame.

Intervals are lengths of time in which a station detected an animal. Many consecutive detections of an animal are replaced by one interval.

<span style="color:red">Warning:</span>

    Input files must include ``datecollected``, ``catalognumber``, and ``unqdetecid`` as columns.

%% Cell type:code id: tags:

``` python
from resonate.filters import get_distance_matrix
from resonate.compress import compress_detections
from resonate.interval_data_tool import interval_data
import pandas as pd
import geopy

input_file = pd.read_csv("/path/to/detections.csv")
compressed = compress_detections(input_file)
matrix = get_distance_matrix(input_file)
```

%% Cell type:markdown id: tags:

Set the station radius for each station name.

%% Cell type:code id: tags:

``` python
detection_radius = 400

station_det_radius = pd.DataFrame([(x, geopy.distance.Distance(detection_radius/1000.0))
                                   for x in matrix.columns.tolist()], columns=['station','radius'])

station_det_radius.set_index('station', inplace=True)

station_det_radius
```

%% Cell type:markdown id: tags:

You can modify individual stations if needed by using ``DatraFrame.set_value()`` from Pandas.

%% Cell type:code id: tags:

``` python
station_name = 'station'
station_name = 'HFX001'

station_detection_radius = 500

station_det_radius.set_value(station_name, 'radius', geopy.distance.Distance( station_detection_radius/1000.0 ))
station_det_radius.at[station_name, 'radius'] = geopy.distance.Distance( station_detection_radius/1000.0 )
```

%% Cell type:markdown id: tags:

Create the interval data by passing the compressed detections, the matrix, and the station radii.

%% Cell type:code id: tags:

``` python
interval = interval_data(compressed_df=compressed, dist_matrix_df=matrix, station_radius_df=station_det_radius)

interval
```

%% Cell type:markdown id: tags:

You can use the Pandas `DataFrame.to_csv()` function to output the file to a desired location.

%% Cell type:code id: tags:

``` python
interval.to_csv('/path/to/output.csv', index=False)
```
+5 −2
Original line number Diff line number Diff line
%% Cell type:markdown id: tags:

# Receiver Efficiency Index

The receiver efficiency index is number between ``0`` and ``1`` indicating the amount of relative activity at each receiver compared to the entire set of receivers, regardless of positioning. The function takes a set detections and a deployment history of the receivers to create a context for the detections. Both the amount of unique tags and number of species are taken into consideration in the calculation.
The receiver efficiency index is number between ``0`` and ``1`` indicating the amount of relative activity at each receiver compared to the entire set of receivers, regardless of positioning.
The function takes a set detections and a deployment history of the receivers to create a context for the detections. Both the amount of unique tags and number of species are taken into
consideration in the calculation.

The receiver efficiency index implement is implemented based on the paper [paper place holder]. Each receiver's index is calculated on the formula of:


<br/>

<div class="large-math">

REI = $\frac{\left(\frac{T_r}{T_a} \times \frac{S_r}{S_a}\right) / \left(\frac{DD_a}{DD_r}\right)}{D_r}$
REI = $\frac{T_r}{T_a} \times \frac{S_r}{S_a} \times \frac{DD_r}{DD_a} \times \frac{D_a}{D_r}$

</div>

<hr/>

* REI = Receiver Efficiency Index
* $T_r$ = The number of tags detected on the receievr
* $T_a$ = The number of tags detected across all receivers
* $S_r$ = The number of species detected on the receiver
* $S_a$ = The number of species detected across all receivers
* $DD_a$ = The number of unique days with detections across all receivers
* $DD_r$ = The number of unique days with detections on the receiver
* $D_a$ = The number of days the array was active
* $D_r$ = The number of days the receiver was active


Each REI is then normalized against the sum of all considered stations. The result is a number between ``0`` and ``1`` indicating the relative amount of activity at each receiver.


<span style="color:red">Warning:</span>

    Detection input files must include ``datecollected``, ``fieldnumber``, ``station``, and ``scientificname`` as columns and deployment input files must include ``station_name``, ``deploy_date``, ``last_download``, and ``recovery_date`` as columns.

``REI()`` takes two arguments. The first is a dataframe of detections the detection timstamp, the station identifier, the species, and the tag identifier. The next is a dataframe of deployments for each station. The station name should match the stations in the detections. The deployments need to include a deployment date and recovery date or last download date. Details on the columns metnioned see the preparing data section.

<span style="color:red">Warning:</span>

    This function assumes that no deployments for single station overlap. If deployments do overlap, the overlapping days will be counted twice.

%% Cell type:code id: tags:

``` python
from resonate.receiver_efficiency import REI

detections = pd.read_csv('/path/to/detections.csv')
deployments = pd.read_csv('/path/to/deployments.csv')

station_REIs = REI(detections = detections, deployments = deployments)
```
Loading