[英]Classify a timestamp as occurring before or after a distance limit is reached in R
I have a dataframe consisting of a series of timestamps with lat-lon point locations relating to animal GPS tracking data, grouped into separate trips made by each animal.我有一个 dataframe 由一系列时间戳组成,这些时间戳带有与动物 GPS 跟踪数据相关的经纬度点位置,分为每只动物进行的单独旅行。 For each timestamped lat-lon, I also have the distance of the point to the animals' home colony (in km).对于每个带时间戳的经纬度,我还有该点到动物栖息地的距离(以公里为单位)。
I would like to classify each point with whether or not it occurred before or after the animal reached its maximum distance from its home colony.我想根据每个点是否发生在动物到达其栖息地的最大距离之前或之后对每个点进行分类。
The aim is to have a column in the dataframe stating where or not the timestamped lat-lon occurs during the outward section of the animals' trip (defined as all points before the animal reached maximum distance to its home colony) or the return section (all points that occurred after the animal reached its maximum distance from its home colony and before it returned to the colony).目的是在 dataframe 中有一列,说明在动物旅行的向外部分(定义为动物到达其栖息地的最大距离之前的所有点)或返回部分(在动物到达其家群落的最大距离之后和返回群落之前发生的所有点)。
Here is example data from 2 trips:以下是 2 次旅行的示例数据:
My desired output is as follows - the below table, with the addition of the 'Loc_Class' (location classification) column, where MAX = maximum distance from the colony, OUT = points falling before the animal reaches that MAX, and RET= points where the animal has reached the maximum distance away from the colony and is returning back to it.我想要的 output 如下 - 下表,添加了“Loc_Class”(位置分类)列,其中 MAX = 与群体的最大距离,OUT = 在动物达到该 MAX 之前下降的点,RET = 点动物已经达到远离群体的最大距离并正在返回。
Trip_ID行程编号 | Timestamp时间戳 | LON伦敦 | LAT土地增值税 | Colony_lat殖民地_lat | Colony_lon菌落_lon | Dist_to_Colony Dist_to_Colony | Loc_Class Loc_Class |
---|---|---|---|---|---|---|---|
A一种 | 18/01/2022 14:00 18/01/2022 14:00 | -2.81698 -2.81698 | -69.831474 -69.831474 | -71.89 -71.89 | 5.159 5.159 | 369.9948202 369.9948202 | MAX最大限度 |
A一种 | 18/01/2022 14:30 18/01/2022 14:30 | -2.750411 -2.750411 | -69.811873 -69.811873 | -71.89 -71.89 | 5.159 5.159 | 369.5644383 369.5644383 | RET休息时间 |
A一种 | 18/01/2022 15:00 18/01/2022 15:00 | -2.736943 -2.736943 | -69.811022 -69.811022 | -71.89 -71.89 | 5.159 5.159 | 369.2463158 369.2463158 | RET休息时间 |
A一种 | 18/01/2022 15:30 18/01/2022 15:30 | -2.645026 -2.645026 | -69.804136 -69.804136 | -71.89 -71.89 | 5.159 5.159 | 367.1665826 367.1665826 | RET休息时间 |
A一种 | 18/01/2022 16:00 18/01/2022 16:00 | -2.56825 -2.56825 | -69.833432 -69.833432 | -71.89 -71.89 | 5.159 5.159 | 362.7877481 362.7877481 | RET休息时间 |
B乙 | 18/01/2022 21:30 18/01/2022 21:30 | -3.046828 -3.046828 | -69.784849 -69.784849 | -71.89 -71.89 | 5.159 5.159 | 380.0350746 380.0350746 | OUT出去 |
B乙 | 18/01/2022 22:00 18/01/2022 22:00 | -3.080154 -3.080154 | -69.765688 -69.765688 | -71.89 -71.89 | 5.159 5.159 | 382.4142364 382.4142364 | OUT出去 |
B乙 | 19/01/2022 00:30 19/01/2022 00:30 | -3.025742 -3.025742 | -69.634483 -69.634483 | -71.89 -71.89 | 5.159 5.159 | 390.8078861 390.8078861 | MAX最大限度 |
B乙 | 19/01/2022 01:00 19/01/2022 01:00 | -2.898522 -2.898522 | -69.672147 -69.672147 | -71.89 -71.89 | 5.159 5.159 | 384.3511473 384.3511473 | RET休息时间 |
B乙 | 19/01/2022 01:30 19/01/2022 01:30 | -2.907463 -2.907463 | -69.769916 -69.769916 | -71.89 -71.89 | 5.159 5.159 | 377.173593 377.173593 | RET休息时间 |
library(tidyverse)
library(dplyr)
library(geosphere)
#load dataframe
df <- read.csv("Tracking_Data.csv")
#Great circle (geodesic) - add the great circle distance between the timestamped location and the animals' colony
df_2 <- df %>% mutate(dist_to_colony = distGeo(cbind(LON, LAT), cbind(Colony_lon, Colony_lat)))
#change distance from colony from m to km
df_2 <- df_2 %>% mutate(dist_to_colony = dist_to_colony/1000)
#find the point at which the maximum distance to colony occurs for each animals' trips
Max_dist_colony <- df_2 %>% group_by(TripID) %>% summarise(across(c(dist_to_colony), max))
#so now I need to classify each point using the 'Timestamp' and 'Dist_to_Colony' column and make a 'Loc_Class' column:
#example df
| Trip_ID | Timestamp | LON | LAT |Colony_lat|Colony_lon|Dist_to_Colony|
| -------- | -----------------|----------------------|--------- |--------- |------------- |
|A |18/01/2022 14:00 |-2.81698 |-69.831474 | -71.89 |5.159 |369.9948202 |
|A |18/01/2022 14:30 |-2.750411|-69.811873 | -71.89 |5.159 |369.5644383 |
|A |18/01/2022 15:00 |-2.736943|-69.811022 | -71.89 |5.159 |369.2463158 |
|A |18/01/2022 15:30 |-2.645026|-69.804136 | -71.89 |5.159 |367.1665826 |
|A |18/01/2022 16:00 |-2.56825 |-69.833432 | -71.89 |5.159 |362.7877481 |
|B |18/01/2022 21:30 |-3.046828|-69.784849 | -71.89 |5.159 |380.0350746 |
|B |18/01/2022 22:00 |-3.080154|-69.765688 | -71.89 |5.159 |382.4142364 |
|B |19/01/2022 00:30 |-3.025742|-69.634483 | -71.89 |5.159 |390.8078861 |
|B |19/01/2022 01:00 |-2.898522|-69.672147 | -71.89 |5.159 |384.3511473 |
|B |19/01/2022 01:30 |-2.907463|-69.769916 | -71.89 |5.159 |377.173593 |
Something like this?是这样的吗?
comp3 <- function(vec, val, out = -1:1) ifelse(abs(vec - val) < 1e-9, out[2], ifelse(vec < val, out[1], out[3]))
quux %>%
group_by(Trip_ID) %>%
mutate(Direction = comp3(row_number(), which.max(Dist_to_Colony), c("OUT", "MAX", "RET"))) %>%
ungroup()
# # A tibble: 10 x 9
# Trip_ID Timestamp LON LAT Colony_lat Colony_lon Dist_to_Colony Loc_Class Direction
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
# 1 A 18/01/2022 14:00 -2.82 -69.8 -71.9 5.16 370. MAX MAX
# 2 A 18/01/2022 14:30 -2.75 -69.8 -71.9 5.16 370. RET RET
# 3 A 18/01/2022 15:00 -2.74 -69.8 -71.9 5.16 369. RET RET
# 4 A 18/01/2022 15:30 -2.65 -69.8 -71.9 5.16 367. RET RET
# 5 A 18/01/2022 16:00 -2.57 -69.8 -71.9 5.16 363. RET RET
# 6 B 18/01/2022 21:30 -3.05 -69.8 -71.9 5.16 380. OUT OUT
# 7 B 18/01/2022 22:00 -3.08 -69.8 -71.9 5.16 382. OUT OUT
# 8 B 19/01/2022 00:30 -3.03 -69.6 -71.9 5.16 391. MAX MAX
# 9 B 19/01/2022 01:00 -2.90 -69.7 -71.9 5.16 384. RET RET
# 10 B 19/01/2022 01:30 -2.91 -69.8 -71.9 5.16 377. RET RET
The comp3
function is really just a ternary-result comparison function: instead of something like +(vec > val)
that returns just 0
(false) and 1
(true), this gives a third result when they are equal. comp3
function 实际上只是一个三元结果比较 function:而不是像+(vec > val)
那样只返回0
(假)和1
(真),当它们相等时,这会给出第三个结果。 For example,例如,
comp3(1:5, 4)
# [1] -1 -1 -1 0 1
The extension to that is the out=
argument that allows the user to specify what the three values should be instead of -1:1
.对此的扩展是out=
参数,它允许用户指定三个值应该是什么而不是-1:1
。 (If you want to shorten the dplyr code, feel free to hard-code the default value of out=
to be your string vector. (如果您想缩短 dplyr 代码,请随意将out=
的默认值硬编码为您的字符串向量。
Another note: the use of abs(vec - val) < 1e-9
is another step towards generalizing it: if given floating-point ( numeric
) values, we might be subject to problems with strict floating-point equality for numbers of high precision (cf, Why are these numbers not equal? , Is floating point math broken? , and https://en.wikipedia.org/wiki/IEEE_754 ).另一个注意事项:使用abs(vec - val) < 1e-9
是朝着泛化它迈出的又一步:如果给定浮点( numeric
)值,我们可能会遇到高精度数字的严格浮点相等性问题(参见,为什么这些数字不相等? ,浮点数学是否损坏? ,和https://en.wikipedia.org/wiki/IEEE_754 )。 In this case it's a little overkill, but it will not return a different value.在这种情况下,它有点矫枉过正,但它不会返回不同的值。 (And since you talk of a table with 4000 or so locations, the "overhead" of doing this one extra step will likely not be human-apparent.) (而且由于您谈论的是一张有 4000 个左右位置的表,执行此额外步骤的“开销”可能不是人类显而易见的。)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.