简体   繁体   English

R 过滤每组的时间序列数据

[英]R Filtering Time Series data for each group

I need to filter the Time Series data based on groups.我需要根据组过滤时间序列数据。 However filtering has to be done at the beginning (-5 minutes) and end of each group (-2 minutes), it means i would like to remove rows at the beginning (-5 minutes) and end (-2 minutes) of each group.但是过滤必须在每个组的开始(-5 分钟)和结束(-2 分钟)进行,这意味着我想在每个组的开始(-5 分钟)和结束(-2 分钟)删除行团体。

Here is the sample code:这是示例代码:

Time <- c("2015-08-21T10:00:51", "2015-08-21T10:02:51", "2015-08-21T10:04:51", "2015-08-21T10:06:51", 
          "2015-08-21T10:08:51", "2015-08-21T10:10:51","2015-08-21T10:12:51", "2015-08-21T10:14:51", 
          "2015-08-21T10:16:51", "2015-08-21T10:18:51", "2015-08-21T10:20:51", "2015-08-21T10:22:51")
x <-  c(38.855, 38.664, 40.386, 40.386, 40.195, 40.386, 40.386, 40.195, 40.386, 38.855, 38.664, 40.386)
y <-  c("a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b")
data <- data.frame(Time,x,y)
data$Time <- as.POSIXct(data$Time, format = "%Y-%m-%dT%H:%M:%S")

Y columns show us the groups, which in this particular case is a and b Y 列向我们展示了组,在这种特殊情况下是ab

So for this example i would remove 3 first rows and 2 last rows for level a , for b same thing (in my original data it will not be that easy to remove it according to the row counts).因此,对于此示例,我将删除a级的 3 第一行和最后 2 行,对于 b 相同的事情(在我的原始数据中,根据行数将其删除并不容易)。 So what i would get at the end something like this:所以我最终会得到这样的东西:

                  Time      x y
4  2015-08-21 10:06:51 40.386 a
10 2015-08-21 10:18:51 38.855 b

I wanna point out that this is only sample data!我想指出这只是样本数据!

Thanks for help!感谢帮助!

I would rather filter the data based on time column rather than row counts, my original data is not so nicely structured like this one and number of rows per each group vary.我宁愿根据时间列而不是行数过滤数据,我的原始数据的结构不像这样很好,每组的行数各不相同。

What about this?那这个呢? Split the data.frame, find first five and last two minutes, do some logical looking up of rows and output the result.拆分data.frame,找到前五分钟和最后两分钟,对行进行一些逻辑查找并输出结果。

xy <- split(data, data$y)

xy <- lapply(xy, FUN = function(m) {
  m[(m$Time > min(m$Time) + (5 * 60)) & ((max(m$Time) - (2 * 60)) > m$Time), ]
})

do.call("rbind", xy)

                    Time      x y
a    2015-08-21 10:06:51 40.386 a
b    2015-08-21 10:18:51 38.855 b

I understand it's cool these days to also present a dplyr solution.我知道现在提供dplyr解决方案也dplyr So here it is.所以在这里。

library(dplyr)

data %>%
  group_by(y) %>%
  filter((Time > (min(Time) + (5*60))) & (max(Time) - (2*60) > Time))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM