将时间戳分类为在 R 中达到距离限制之前或之后发生

Question

I have a dataframe consisting of a series of timestamps with lat-lon point locations relating to animal GPS tracking data, grouped into separate trips made by each animal.我有一个 dataframe 由一系列时间戳组成，这些时间戳带有与动物 GPS 跟踪数据相关的经纬度点位置，分为每只动物进行的单独旅行。 For each timestamped lat-lon, I also have the distance of the point to the animals' home colony (in km).对于每个带时间戳的经纬度，我还有该点到动物栖息地的距离（以公里为单位）。

I would like to classify each point with whether or not it occurred before or after the animal reached its maximum distance from its home colony.我想根据每个点是否发生在动物到达其栖息地的最大距离之前或之后对每个点进行分类。

The aim is to have a column in the dataframe stating where or not the timestamped lat-lon occurs during the outward section of the animals' trip (defined as all points before the animal reached maximum distance to its home colony) or the return section (all points that occurred after the animal reached its maximum distance from its home colony and before it returned to the colony).目的是在 dataframe 中有一列，说明在动物旅行的向外部分（定义为动物到达其栖息地的最大距离之前的所有点）或返回部分（在动物到达其家群落的最大距离之后和返回群落之前发生的所有点）。

Here is example data from 2 trips:以下是 2 次旅行的示例数据：

My desired output is as follows - the below table, with the addition of the 'Loc_Class' (location classification) column, where MAX = maximum distance from the colony, OUT = points falling before the animal reaches that MAX, and RET= points where the animal has reached the maximum distance away from the colony and is returning back to it.我想要的 output 如下 - 下表，添加了“Loc_Class”（位置分类）列，其中 MAX = 与群体的最大距离，OUT = 在动物达到该 MAX 之前下降的点，RET = 点动物已经达到远离群体的最大距离并正在返回。

Trip_ID行程编号	Timestamp时间戳	LON伦敦	LAT土地增值税	Colony_lat殖民地_lat	Colony_lon菌落_lon	Dist_to_Colony Dist_to_Colony	Loc_Class Loc_Class
A一种	18/01/2022 14:00 18/01/2022 14:00	-2.81698 -2.81698	-69.831474 -69.831474	-71.89 -71.89	5.159 5.159	369.9948202 369.9948202	MAX最大限度
A一种	18/01/2022 14:30 18/01/2022 14:30	-2.750411 -2.750411	-69.811873 -69.811873	-71.89 -71.89	5.159 5.159	369.5644383 369.5644383	RET休息时间
A一种	18/01/2022 15:00 18/01/2022 15:00	-2.736943 -2.736943	-69.811022 -69.811022	-71.89 -71.89	5.159 5.159	369.2463158 369.2463158	RET休息时间
A一种	18/01/2022 15:30 18/01/2022 15:30	-2.645026 -2.645026	-69.804136 -69.804136	-71.89 -71.89	5.159 5.159	367.1665826 367.1665826	RET休息时间
A一种	18/01/2022 16:00 18/01/2022 16:00	-2.56825 -2.56825	-69.833432 -69.833432	-71.89 -71.89	5.159 5.159	362.7877481 362.7877481	RET休息时间
B乙	18/01/2022 21:30 18/01/2022 21:30	-3.046828 -3.046828	-69.784849 -69.784849	-71.89 -71.89	5.159 5.159	380.0350746 380.0350746	OUT出去
B乙	18/01/2022 22:00 18/01/2022 22:00	-3.080154 -3.080154	-69.765688 -69.765688	-71.89 -71.89	5.159 5.159	382.4142364 382.4142364	OUT出去
B乙	19/01/2022 00:30 19/01/2022 00:30	-3.025742 -3.025742	-69.634483 -69.634483	-71.89 -71.89	5.159 5.159	390.8078861 390.8078861	MAX最大限度
B乙	19/01/2022 01:00 19/01/2022 01:00	-2.898522 -2.898522	-69.672147 -69.672147	-71.89 -71.89	5.159 5.159	384.3511473 384.3511473	RET休息时间
B乙	19/01/2022 01:30 19/01/2022 01:30	-2.907463 -2.907463	-69.769916 -69.769916	-71.89 -71.89	5.159 5.159	377.173593 377.173593	RET休息时间

library(tidyverse)
library(dplyr)
library(geosphere)

#load dataframe
df <- read.csv("Tracking_Data.csv")

#Great circle (geodesic) - add the great circle distance between the timestamped location and the animals' colony 
df_2 <- df %>% mutate(dist_to_colony = distGeo(cbind(LON, LAT), cbind(Colony_lon, Colony_lat)))

#change distance from colony from m to km 
df_2 <- df_2 %>% mutate(dist_to_colony = dist_to_colony/1000)

#find the point at which the maximum distance to colony occurs for each animals' trips
Max_dist_colony <- df_2 %>% group_by(TripID) %>% summarise(across(c(dist_to_colony), max))

#so now I need to classify each point using the 'Timestamp' and 'Dist_to_Colony' column and make a 'Loc_Class' column: 

#example df

| Trip_ID  | Timestamp        | LON      | LAT       |Colony_lat|Colony_lon|Dist_to_Colony|
| -------- | -----------------|----------------------|--------- |--------- |------------- |
|A     |18/01/2022 14:00  |-2.81698 |-69.831474  |  -71.89  |5.159     |369.9948202   |
|A     |18/01/2022 14:30  |-2.750411|-69.811873  |  -71.89  |5.159     |369.5644383   |
|A     |18/01/2022 15:00  |-2.736943|-69.811022  |  -71.89  |5.159     |369.2463158   |
|A     |18/01/2022 15:30  |-2.645026|-69.804136  |  -71.89  |5.159     |367.1665826   |
|A     |18/01/2022 16:00  |-2.56825 |-69.833432  |  -71.89  |5.159     |362.7877481   |
|B     |18/01/2022 21:30  |-3.046828|-69.784849  |  -71.89  |5.159     |380.0350746   |
|B     |18/01/2022 22:00  |-3.080154|-69.765688  |  -71.89  |5.159     |382.4142364   |
|B     |19/01/2022 00:30  |-3.025742|-69.634483  |  -71.89  |5.159     |390.8078861   |
|B     |19/01/2022 01:00  |-2.898522|-69.672147  |  -71.89  |5.159     |384.3511473   |
|B     |19/01/2022 01:30  |-2.907463|-69.769916  |  -71.89  |5.159     |377.173593    |

Answer 1

Something like this?是这样的吗？

comp3 <- function(vec, val, out = -1:1) ifelse(abs(vec - val) < 1e-9, out[2], ifelse(vec < val, out[1], out[3]))
quux %>%
  group_by(Trip_ID) %>%
  mutate(Direction = comp3(row_number(), which.max(Dist_to_Colony), c("OUT", "MAX", "RET"))) %>%
  ungroup()
# # A tibble: 10 x 9
#    Trip_ID Timestamp          LON   LAT Colony_lat Colony_lon Dist_to_Colony Loc_Class Direction
#    <chr>   <chr>            <dbl> <dbl>      <dbl>      <dbl>          <dbl> <chr>     <chr>    
#  1 A       18/01/2022 14:00 -2.82 -69.8      -71.9       5.16           370. MAX       MAX      
#  2 A       18/01/2022 14:30 -2.75 -69.8      -71.9       5.16           370. RET       RET      
#  3 A       18/01/2022 15:00 -2.74 -69.8      -71.9       5.16           369. RET       RET      
#  4 A       18/01/2022 15:30 -2.65 -69.8      -71.9       5.16           367. RET       RET      
#  5 A       18/01/2022 16:00 -2.57 -69.8      -71.9       5.16           363. RET       RET      
#  6 B       18/01/2022 21:30 -3.05 -69.8      -71.9       5.16           380. OUT       OUT      
#  7 B       18/01/2022 22:00 -3.08 -69.8      -71.9       5.16           382. OUT       OUT      
#  8 B       19/01/2022 00:30 -3.03 -69.6      -71.9       5.16           391. MAX       MAX      
#  9 B       19/01/2022 01:00 -2.90 -69.7      -71.9       5.16           384. RET       RET      
# 10 B       19/01/2022 01:30 -2.91 -69.8      -71.9       5.16           377. RET       RET

The comp3 function is really just a ternary-result comparison function: instead of something like +(vec > val) that returns just 0 (false) and 1 (true), this gives a third result when they are equal. comp3 function 实际上只是一个三元结果比较 function：而不是像+(vec > val)那样只返回0 （假）和1 （真），当它们相等时，这会给出第三个结果。 For example,例如，

comp3(1:5, 4)
# [1] -1 -1 -1  0  1

The extension to that is the out= argument that allows the user to specify what the three values should be instead of -1:1 .对此的扩展是out=参数，它允许用户指定三个值应该是什么而不是-1:1 。 (If you want to shorten the dplyr code, feel free to hard-code the default value of out= to be your string vector. （如果您想缩短 dplyr 代码，请随意将out=的默认值硬编码为您的字符串向量。

Another note: the use of abs(vec - val) < 1e-9 is another step towards generalizing it: if given floating-point ( numeric ) values, we might be subject to problems with strict floating-point equality for numbers of high precision (cf, Why are these numbers not equal? , Is floating point math broken? , and https://en.wikipedia.org/wiki/IEEE_754 ).另一个注意事项：使用abs(vec - val) < 1e-9是朝着泛化它迈出的又一步：如果给定浮点（ numeric ）值，我们可能会遇到高精度数字的严格浮点相等性问题（参见，为什么这些数字不相等？，浮点数学是否损坏？，和https://en.wikipedia.org/wiki/IEEE_754 ）。 In this case it's a little overkill, but it will not return a different value.在这种情况下，它有点矫枉过正，但它不会返回不同的值。 (And since you talk of a table with 4000 or so locations, the "overhead" of doing this one extra step will likely not be human-apparent.) （而且由于您谈论的是一张有 4000 个左右位置的表，执行此额外步骤的“开销”可能不是人类显而易见的。）

将时间戳分类为在 R 中达到距离限制之前或之后发生

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-11-19 17:35:52

将时间戳分类为在 R 中达到距离限制之前或之后发生

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-11-19 17:35:52

解决方案1
1 已采纳 2022-11-19 17:35:52