Commit 4b11e370 authored by Alex Nunes's avatar Alex Nunes
Browse files

Added a fewmore examples to the analysis notebook

parent e8ba1b80
Loading
Loading
Loading
Loading
+10 −0
Original line number Diff line number Diff line
%% Cell type:markdown id: tags:

# Data Analysis Using GLATOS

__Data analysis__ is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making *(https://en.wikipedia.org/wiki/Data_analysis)*.

Basically, the point is to turn data into information and information into knowledge. There are many ways to look at and compare dependent and independent variables, find relations, and even create models and predict behaviours. But, we are only going to focus on analyzing time and location data in the simplest way.

First, clean and filter the data:

%% Cell type:code id: tags:

``` R
library(dplyr)
library(glatos)

detections <- glatos::read_otn_detections("data/nsbs_matched_detections_2014.csv")
detections <- detections %>% filter(!stringr::str_detect(unqdetecid, "release"))
detections <- glatos::false_detections(detections, tf = 3600)
filtered_detections <- detections %>% filter(passed_filter != FALSE)
```

%% Cell type:code id: tags:

``` R
detection_events <- glatos::detection_events(filtered_detections)
detection_events
```

%% Cell type:markdown id: tags:

## Time Series Analysis & Lubridate

Time series show the when, the before, and the after for data points. The ``lubridate`` package is especially useful for handling time calculations.

Date-time data can be frustrating to work with in R. R commands for date-times are generally unintuitive and change depending on the type of date-time object being used. Moreover, the methods we use with date-times must be robust to time zones, leap days, daylight savings times, and other time related quirks, and R lacks these capabilities in some situations. Lubridate makes it easier to do the things R does with date-times and possible to do the things R does not.

*Source: https://lubridate.tidyverse.org*

%% Cell type:code id: tags:

``` R
library(lubridate)
detection_events <- detection_events %>% mutate(detection_interval = lubridate::interval(first_detection, last_detection))

# detection_events
```

%% Cell type:markdown id: tags:

Now that we have an interval column, we can go row by row and look at each location to figure out if more than one animal was seen at a station. This is the beginning of a cohort analysis.

%% Cell type:code id: tags:

``` R
 individual detection event
for(event in detection_events$event) {

    detection_events$overlaps_with[event] = paste( # We use paste to create a string of other events
                                                which(detection_events$location == detection_events$location[event] &  # Make sure that the location is the same
                                                       detection_events$event != event &  # Make sure the event is not the same
                                                       lubridate::int_overlaps(detection_events$detection_interval[event], detection_events$detection_interval) # We can use lubridate's int_overlaps function to find the overlapping events
                                                     ),
                                                collapse=",")
}
detection_events
```

%% Cell type:markdown id: tags:

We can then filter based on whether or not the ``overlaps_with`` string is empty

%% Cell type:code id: tags:

``` R
detection_events %>% select(-one_of("detection_interval")) %>% filter(detection_events$overlaps_with != '')
```

%% Cell type:markdown id: tags:

## Summarise

Summarise is a useful function implemented to create a new data frame from running functions on grouped data.

``summarise()`` is typically used on grouped data created by ``group_by()``. The output will have one row for each group.

%% Cell type:code id: tags:

``` R
detection_events %>% group_by(location) %>% summarise(detection_count = sum(num_detections),
                                                      num_unique_tags = n_distinct(individual),
                                                     total_residence_time_in_seconds = sum(detection_interval))
```

%% Cell type:markdown id: tags:

## Plotting

%% Cell type:code id: tags:

``` R
glatos::abacus_plot(filtered_detections, location_col = 'animal_id')
```

%% Cell type:code id: tags:

``` R
```