简体   繁体   English

将时间戳分类为在 R 中达到距离限制之前或之后发生

[英]Classify a timestamp as occurring before or after a distance limit is reached in R

I have a dataframe consisting of a series of timestamps with lat-lon point locations relating to animal GPS tracking data, grouped into separate trips made by each animal.我有一个 dataframe 由一系列时间戳组成,这些时间戳带有与动物 GPS 跟踪数据相关的经纬度点位置,分为每只动物进行的单独旅行。 For each timestamped lat-lon, I also have the distance of the point to the animals' home colony (in km).对于每个带时间戳的经纬度,我还有该点到动物栖息地的距离(以公里为单位)。

I would like to classify each point with whether or not it occurred before or after the animal reached its maximum distance from its home colony.我想根据每个点是否发生在动物到达其栖息地的最大距离之前或之后对每个点进行分类。

The aim is to have a column in the dataframe stating where or not the timestamped lat-lon occurs during the outward section of the animals' trip (defined as all points before the animal reached maximum distance to its home colony) or the return section (all points that occurred after the animal reached its maximum distance from its home colony and before it returned to the colony).目的是在 dataframe 中有一列,说明在动物旅行的向外部分(定义为动物到达其栖息地的最大距离之前的所有点)或返回部分(在动物到达其家群落的最大距离之后和返回群落之前发生的所有点)。

Here is example data from 2 trips:以下是 2 次旅行的示例数据:

My desired output is as follows - the below table, with the addition of the 'Loc_Class' (location classification) column, where MAX = maximum distance from the colony, OUT = points falling before the animal reaches that MAX, and RET= points where the animal has reached the maximum distance away from the colony and is returning back to it.我想要的 output 如下 - 下表,添加了“Loc_Class”(位置分类)列,其中 MAX = 与群体的最大距离,OUT = 在动物达到该 MAX 之前下降的点,RET = 点动物已经达到远离群体的最大距离并正在返回。

Trip_ID行程编号 Timestamp时间戳 LON伦敦 LAT土地增值税 Colony_lat殖民地_lat Colony_lon菌落_lon Dist_to_Colony Dist_to_Colony Loc_Class Loc_Class
A一种 18/01/2022 14:00 18/01/2022 14:00 -2.81698 -2.81698 -69.831474 -69.831474 -71.89 -71.89 5.159 5.159 369.9948202 369.9948202 MAX最大限度
A一种 18/01/2022 14:30 18/01/2022 14:30 -2.750411 -2.750411 -69.811873 -69.811873 -71.89 -71.89 5.159 5.159 369.5644383 369.5644383 RET休息时间
A一种 18/01/2022 15:00 18/01/2022 15:00 -2.736943 -2.736943 -69.811022 -69.811022 -71.89 -71.89 5.159 5.159 369.2463158 369.2463158 RET休息时间
A一种 18/01/2022 15:30 18/01/2022 15:30 -2.645026 -2.645026 -69.804136 -69.804136 -71.89 -71.89 5.159 5.159 367.1665826 367.1665826 RET休息时间
A一种 18/01/2022 16:00 18/01/2022 16:00 -2.56825 -2.56825 -69.833432 -69.833432 -71.89 -71.89 5.159 5.159 362.7877481 362.7877481 RET休息时间
B 18/01/2022 21:30 18/01/2022 21:30 -3.046828 -3.046828 -69.784849 -69.784849 -71.89 -71.89 5.159 5.159 380.0350746 380.0350746 OUT出去
B 18/01/2022 22:00 18/01/2022 22:00 -3.080154 -3.080154 -69.765688 -69.765688 -71.89 -71.89 5.159 5.159 382.4142364 382.4142364 OUT出去
B 19/01/2022 00:30 19/01/2022 00:30 -3.025742 -3.025742 -69.634483 -69.634483 -71.89 -71.89 5.159 5.159 390.8078861 390.8078861 MAX最大限度
B 19/01/2022 01:00 19/01/2022 01:00 -2.898522 -2.898522 -69.672147 -69.672147 -71.89 -71.89 5.159 5.159 384.3511473 384.3511473 RET休息时间
B 19/01/2022 01:30 19/01/2022 01:30 -2.907463 -2.907463 -69.769916 -69.769916 -71.89 -71.89 5.159 5.159 377.173593 377.173593 RET休息时间
library(tidyverse)
library(dplyr)
library(geosphere)

#load dataframe
df <- read.csv("Tracking_Data.csv")

#Great circle (geodesic) - add the great circle distance between the timestamped location and the animals' colony 
df_2 <- df %>% mutate(dist_to_colony = distGeo(cbind(LON, LAT), cbind(Colony_lon, Colony_lat)))

#change distance from colony from m to km 
df_2 <- df_2 %>% mutate(dist_to_colony = dist_to_colony/1000)

#find the point at which the maximum distance to colony occurs for each animals' trips
Max_dist_colony <- df_2 %>% group_by(TripID) %>% summarise(across(c(dist_to_colony), max))

#so now I need to classify each point using the 'Timestamp' and 'Dist_to_Colony' column and make a 'Loc_Class' column: 

#example df

| Trip_ID  | Timestamp        | LON      | LAT       |Colony_lat|Colony_lon|Dist_to_Colony|
| -------- | -----------------|----------------------|--------- |--------- |------------- |
|A     |18/01/2022 14:00  |-2.81698 |-69.831474  |  -71.89  |5.159     |369.9948202   |
|A     |18/01/2022 14:30  |-2.750411|-69.811873  |  -71.89  |5.159     |369.5644383   |
|A     |18/01/2022 15:00  |-2.736943|-69.811022  |  -71.89  |5.159     |369.2463158   |
|A     |18/01/2022 15:30  |-2.645026|-69.804136  |  -71.89  |5.159     |367.1665826   |
|A     |18/01/2022 16:00  |-2.56825 |-69.833432  |  -71.89  |5.159     |362.7877481   |
|B     |18/01/2022 21:30  |-3.046828|-69.784849  |  -71.89  |5.159     |380.0350746   |
|B     |18/01/2022 22:00  |-3.080154|-69.765688  |  -71.89  |5.159     |382.4142364   |
|B     |19/01/2022 00:30  |-3.025742|-69.634483  |  -71.89  |5.159     |390.8078861   |
|B     |19/01/2022 01:00  |-2.898522|-69.672147  |  -71.89  |5.159     |384.3511473   |
|B     |19/01/2022 01:30  |-2.907463|-69.769916  |  -71.89  |5.159     |377.173593    |

Something like this?是这样的吗?

comp3 <- function(vec, val, out = -1:1) ifelse(abs(vec - val) < 1e-9, out[2], ifelse(vec < val, out[1], out[3]))
quux %>%
  group_by(Trip_ID) %>%
  mutate(Direction = comp3(row_number(), which.max(Dist_to_Colony), c("OUT", "MAX", "RET"))) %>%
  ungroup()
# # A tibble: 10 x 9
#    Trip_ID Timestamp          LON   LAT Colony_lat Colony_lon Dist_to_Colony Loc_Class Direction
#    <chr>   <chr>            <dbl> <dbl>      <dbl>      <dbl>          <dbl> <chr>     <chr>    
#  1 A       18/01/2022 14:00 -2.82 -69.8      -71.9       5.16           370. MAX       MAX      
#  2 A       18/01/2022 14:30 -2.75 -69.8      -71.9       5.16           370. RET       RET      
#  3 A       18/01/2022 15:00 -2.74 -69.8      -71.9       5.16           369. RET       RET      
#  4 A       18/01/2022 15:30 -2.65 -69.8      -71.9       5.16           367. RET       RET      
#  5 A       18/01/2022 16:00 -2.57 -69.8      -71.9       5.16           363. RET       RET      
#  6 B       18/01/2022 21:30 -3.05 -69.8      -71.9       5.16           380. OUT       OUT      
#  7 B       18/01/2022 22:00 -3.08 -69.8      -71.9       5.16           382. OUT       OUT      
#  8 B       19/01/2022 00:30 -3.03 -69.6      -71.9       5.16           391. MAX       MAX      
#  9 B       19/01/2022 01:00 -2.90 -69.7      -71.9       5.16           384. RET       RET      
# 10 B       19/01/2022 01:30 -2.91 -69.8      -71.9       5.16           377. RET       RET      

The comp3 function is really just a ternary-result comparison function: instead of something like +(vec > val) that returns just 0 (false) and 1 (true), this gives a third result when they are equal. comp3 function 实际上只是一个三元结果比较 function:而不是像+(vec > val)那样只返回0 (假)和1 (真),当它们相等时,这会给出第三个结果。 For example,例如,

comp3(1:5, 4)
# [1] -1 -1 -1  0  1

The extension to that is the out= argument that allows the user to specify what the three values should be instead of -1:1 .对此的扩展是out=参数,它允许用户指定三个值应该是什么而不是-1:1 (If you want to shorten the dplyr code, feel free to hard-code the default value of out= to be your string vector. (如果您想缩短 dplyr 代码,请随意将out=的默认值硬编码为您的字符串向量。

Another note: the use of abs(vec - val) < 1e-9 is another step towards generalizing it: if given floating-point ( numeric ) values, we might be subject to problems with strict floating-point equality for numbers of high precision (cf, Why are these numbers not equal? , Is floating point math broken? , and https://en.wikipedia.org/wiki/IEEE_754 ).另一个注意事项:使用abs(vec - val) < 1e-9是朝着泛化它迈出的又一步:如果给定浮点( numeric )值,我们可能会遇到高精度数字的严格浮点相等性问题(参见,为什么这些数字不相等?浮点数学是否损坏? ,和https://en.wikipedia.org/wiki/IEEE_754 )。 In this case it's a little overkill, but it will not return a different value.在这种情况下,它有点矫枉过正,但它不会返回不同的值。 (And since you talk of a table with 4000 or so locations, the "overhead" of doing this one extra step will likely not be human-apparent.) (而且由于您谈论的是一张有 4000 个左右位置的表,执行此额外步骤的“开销”可能不是人类显而易见的。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 达到R中的CPU时间限制 - Reached CPU time limit in R 如何提取 R 中一个字符之后和另一个字符最后一次出现之前发生的所有内容? - How to extract everything occurring after a character and before the last occurrence of another character in R? R 逻辑,用于替换出现在 1 之前的每个零 - R logic for replacing every zero occurring before 1 R中的“达到经过时间限制”错误 - "reached elapsed time limit" errors in R 错误:矢量内存耗尽(达到限制?)R 3.5.0 macOS - Error: vector memory exhausted (limit reached?) R 3.5.0 macOS MacOS 上的 R 错误:向量内存已耗尽(已达到限制?) - R on MacOS Error: vector memory exhausted (limit reached?) 错误:内存耗尽(达到限制?)R 代码 Windows 任务计划程序 - Error: Memory Exhausted (limit reached?) R code Windows Task Scheduler R-向量内存已耗尽(是否达到极限?)嵌套循环的内存问题? - R - vector memory exhausted (limit reached?) Memory issues with nested loops? R-Studio Vector Memory 用尽(达到极限?) - R-Studio Vector Memory Exhausted (limit reached?) 在 for 循环中,如果用户在 R 中的时间戳之前和之后都没有数据,则移动到下一个循环 - In a for loop, moving to next loop if a user doesn't have data both before and after a timestamp in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM