简体   繁体   English

如果行在特定时间内按 R 中的组值发生,则删除行

[英]Removing rows if they occur within a certain time of each other by a group value in R

My data df looks like the following:我的数据df如下所示:

Row    Timestamp            ID
1    0020-06-29 12:14:00     B 
2    0020-06-29 12:27:00     A 
3    0020-06-29 12:27:22     B  
4    0020-06-29 12:28:30     A 
5    0020-06-29 12:43:00     B 
6    0020-06-29 12:44:00     C 
7    0020-06-29 12:45:00     B 
8    0020-06-29 12:55:00     A 
9    0020-06-29 12:57:00     C 
10   0020-06-29 13:04:00     B 


   
   

The Timestamp indicates the date and time of a reading, and ID the tag identification code. Timestamp表示读数的日期和时间, ID表示标签识别码。

What I am trying to do is remove any Timestamp by the same ID that occurs within 5 minutes of the previous Timestamp.我想要做的是删除与前一个时间戳 5 分钟内出现的相同ID的任何Timestamp So, although ID A is seen in Row 2 and Row 4, since the two rows of the dataframe occur within 5 minutes of each other, we would remove Row 4 but keep Row 2 and Row 8, which for ID A occurs 18 minutes later.因此,虽然ID A 出现在第 2 RowRow 4 行中,但由于 dataframe 的两行在 5 分钟内出现,我们将删除第 4 Row但保留Row 2 Row和第 8 行,对于 ID A 来说,这发生在 18 分钟后.

Update: The first timestamp should take precedent and all subsequent timestamps should be either kept or removed from then on.更新:第一个时间戳应该是先例,所有后续时间戳都应该保留或从那时起删除。 So, if we have 3 timestamps corresponding to the same ID and with a time interval of 4.5 minutes and 2 minutes, respectively, between timestamp 1 and 2 and timestamp 2 and 3, I would like remove timestamp 2 and keep 1 and 3. This way the next timestamp we keep would be the one that occurs at least 5 minutes after timestamp 3, and so on.因此,如果我们有 3 个时间戳对应于相同的 ID,时间间隔分别为 4.5 分钟和 2 分钟,时间戳 1 和 2 以及时间戳 2 和 3 之间,我想删除时间戳 2 并保留 1 和 3。这我们保留的下一个时间戳将是在时间戳 3 之后至少 5 分钟出现的时间戳,依此类推。

I have tried the following:我尝试了以下方法:

first_date <- df$Timestamp[1:(length(df$Timestamp)-1)]
second_date <- df$Timestamp[2:length(df$Timestamp)]
second_gap <- difftime(second_date, first_date, units="mins")

dup_index <- second_gap>5 # set this as a 5-minute threshold
dup_index <- c(TRUE, dup_index)
df_cleaned <- df[dup_index, ]

But this deletes all observations within 5-minutes of each other and does not take into account the ID .但这会删除彼此相隔 5 分钟内的所有观察结果,并且不会考虑ID I would usually just subset but I am working with around 180 unique ID s.我通常只是subset ,但我正在处理大约 180 个唯一ID

Supposing that what I comment above does not occur, a possible solution is the following:假设我上面的评论没有发生,可能的解决方案如下:

library(tidyverse)
library(lubridate)

elapsed <- function(x)
{
  y <- abs(as.duration(x[2:length(x)] %--% x[1:(length(x)-1)]))
  y >= 5*60
} 

df %>% 
  group_split(ID) %>% 
  map_dfr(~ .[c(T, if (nrow(.) > 1) elapsed(.$Timestamp)),]) %>% 
  arrange(Row)

The output: output:

# A tibble: 8 × 3
    Row Timestamp           ID   
  <int> <chr>               <chr>
1     1 0020-06-29 12:14:00 B    
2     2 0020-06-29 12:27:00 A    
3     3 0020-06-29 12:27:22 B    
4     5 0020-06-29 12:43:00 B    
5     6 0020-06-29 12:44:00 C    
6     8 0020-06-29 12:55:00 A    
7     9 0020-06-29 12:57:00 C    
8    10 0020-06-29 13:04:00 B    

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如果行在第一个实例的某个时间内出现,则通过 R 中的组值删除行 - Removing rows if they occur within a certain time of first instance by a group value in R 如何在R中互相找到特定时间范围内的观测值 - How to find observations within a certain time range of each other in R 在R中的变量上选择具有特定值的每个组的最后两行 - Select last two rows of each group with certain value on a variable in R R中达到一定值时删除组 - Removing a Group When A Certain Value is Reached in R 在R中某个值之后删除行 - Removing rows after a certain value in R 根据R中的每一行统计一定时间范围内的行数(tidyverse) - Count the number of rows within a certain time range based on each row in R (tidyverse) 删除R中一组中所有值为零的行 - Removing rows that have all zero values within one group in R 分组,然后在日期时间超过特定时间时创建一个“中断”,在原始分组列(R,dplyr)中创建一个新值 - Group, then create a 'break' if the datetime exceeds a certain time, creating a new value within original grouped column (R, dplyr) 对彼此相距一定阈值距离内的多边形进行分组 - Group polygons that are within a certain threshold distance of each other 如何在R中的组中选择具有特定值的行 - How to select rows with certain values within a group in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM