简体   繁体   中英

Classify a timestamp as occurring before or after a distance limit is reached in R

I have a dataframe consisting of a series of timestamps with lat-lon point locations relating to animal GPS tracking data, grouped into separate trips made by each animal. For each timestamped lat-lon, I also have the distance of the point to the animals' home colony (in km).

I would like to classify each point with whether or not it occurred before or after the animal reached its maximum distance from its home colony.

The aim is to have a column in the dataframe stating where or not the timestamped lat-lon occurs during the outward section of the animals' trip (defined as all points before the animal reached maximum distance to its home colony) or the return section (all points that occurred after the animal reached its maximum distance from its home colony and before it returned to the colony).

Here is example data from 2 trips:

My desired output is as follows - the below table, with the addition of the 'Loc_Class' (location classification) column, where MAX = maximum distance from the colony, OUT = points falling before the animal reaches that MAX, and RET= points where the animal has reached the maximum distance away from the colony and is returning back to it.

Trip_ID Timestamp LON LAT Colony_lat Colony_lon Dist_to_Colony Loc_Class
A 18/01/2022 14:00 -2.81698 -69.831474 -71.89 5.159 369.9948202 MAX
A 18/01/2022 14:30 -2.750411 -69.811873 -71.89 5.159 369.5644383 RET
A 18/01/2022 15:00 -2.736943 -69.811022 -71.89 5.159 369.2463158 RET
A 18/01/2022 15:30 -2.645026 -69.804136 -71.89 5.159 367.1665826 RET
A 18/01/2022 16:00 -2.56825 -69.833432 -71.89 5.159 362.7877481 RET
B 18/01/2022 21:30 -3.046828 -69.784849 -71.89 5.159 380.0350746 OUT
B 18/01/2022 22:00 -3.080154 -69.765688 -71.89 5.159 382.4142364 OUT
B 19/01/2022 00:30 -3.025742 -69.634483 -71.89 5.159 390.8078861 MAX
B 19/01/2022 01:00 -2.898522 -69.672147 -71.89 5.159 384.3511473 RET
B 19/01/2022 01:30 -2.907463 -69.769916 -71.89 5.159 377.173593 RET
library(tidyverse)
library(dplyr)
library(geosphere)

#load dataframe
df <- read.csv("Tracking_Data.csv")

#Great circle (geodesic) - add the great circle distance between the timestamped location and the animals' colony 
df_2 <- df %>% mutate(dist_to_colony = distGeo(cbind(LON, LAT), cbind(Colony_lon, Colony_lat)))

#change distance from colony from m to km 
df_2 <- df_2 %>% mutate(dist_to_colony = dist_to_colony/1000)

#find the point at which the maximum distance to colony occurs for each animals' trips
Max_dist_colony <- df_2 %>% group_by(TripID) %>% summarise(across(c(dist_to_colony), max))

#so now I need to classify each point using the 'Timestamp' and 'Dist_to_Colony' column and make a 'Loc_Class' column: 

#example df

| Trip_ID  | Timestamp        | LON      | LAT       |Colony_lat|Colony_lon|Dist_to_Colony|
| -------- | -----------------|----------------------|--------- |--------- |------------- |
|A     |18/01/2022 14:00  |-2.81698 |-69.831474  |  -71.89  |5.159     |369.9948202   |
|A     |18/01/2022 14:30  |-2.750411|-69.811873  |  -71.89  |5.159     |369.5644383   |
|A     |18/01/2022 15:00  |-2.736943|-69.811022  |  -71.89  |5.159     |369.2463158   |
|A     |18/01/2022 15:30  |-2.645026|-69.804136  |  -71.89  |5.159     |367.1665826   |
|A     |18/01/2022 16:00  |-2.56825 |-69.833432  |  -71.89  |5.159     |362.7877481   |
|B     |18/01/2022 21:30  |-3.046828|-69.784849  |  -71.89  |5.159     |380.0350746   |
|B     |18/01/2022 22:00  |-3.080154|-69.765688  |  -71.89  |5.159     |382.4142364   |
|B     |19/01/2022 00:30  |-3.025742|-69.634483  |  -71.89  |5.159     |390.8078861   |
|B     |19/01/2022 01:00  |-2.898522|-69.672147  |  -71.89  |5.159     |384.3511473   |
|B     |19/01/2022 01:30  |-2.907463|-69.769916  |  -71.89  |5.159     |377.173593    |

Something like this?

comp3 <- function(vec, val, out = -1:1) ifelse(abs(vec - val) < 1e-9, out[2], ifelse(vec < val, out[1], out[3]))
quux %>%
  group_by(Trip_ID) %>%
  mutate(Direction = comp3(row_number(), which.max(Dist_to_Colony), c("OUT", "MAX", "RET"))) %>%
  ungroup()
# # A tibble: 10 x 9
#    Trip_ID Timestamp          LON   LAT Colony_lat Colony_lon Dist_to_Colony Loc_Class Direction
#    <chr>   <chr>            <dbl> <dbl>      <dbl>      <dbl>          <dbl> <chr>     <chr>    
#  1 A       18/01/2022 14:00 -2.82 -69.8      -71.9       5.16           370. MAX       MAX      
#  2 A       18/01/2022 14:30 -2.75 -69.8      -71.9       5.16           370. RET       RET      
#  3 A       18/01/2022 15:00 -2.74 -69.8      -71.9       5.16           369. RET       RET      
#  4 A       18/01/2022 15:30 -2.65 -69.8      -71.9       5.16           367. RET       RET      
#  5 A       18/01/2022 16:00 -2.57 -69.8      -71.9       5.16           363. RET       RET      
#  6 B       18/01/2022 21:30 -3.05 -69.8      -71.9       5.16           380. OUT       OUT      
#  7 B       18/01/2022 22:00 -3.08 -69.8      -71.9       5.16           382. OUT       OUT      
#  8 B       19/01/2022 00:30 -3.03 -69.6      -71.9       5.16           391. MAX       MAX      
#  9 B       19/01/2022 01:00 -2.90 -69.7      -71.9       5.16           384. RET       RET      
# 10 B       19/01/2022 01:30 -2.91 -69.8      -71.9       5.16           377. RET       RET      

The comp3 function is really just a ternary-result comparison function: instead of something like +(vec > val) that returns just 0 (false) and 1 (true), this gives a third result when they are equal. For example,

comp3(1:5, 4)
# [1] -1 -1 -1  0  1

The extension to that is the out= argument that allows the user to specify what the three values should be instead of -1:1 . (If you want to shorten the dplyr code, feel free to hard-code the default value of out= to be your string vector.

Another note: the use of abs(vec - val) < 1e-9 is another step towards generalizing it: if given floating-point ( numeric ) values, we might be subject to problems with strict floating-point equality for numbers of high precision (cf, Why are these numbers not equal? , Is floating point math broken? , and https://en.wikipedia.org/wiki/IEEE_754 ). In this case it's a little overkill, but it will not return a different value. (And since you talk of a table with 4000 or so locations, the "overhead" of doing this one extra step will likely not be human-apparent.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM