[英]R / tibble - subset time series up to variable condition met?
How do I subset a time series from the start up to the first occurrence of a variable meeting a condition? 如何对从开始到满足条件的变量的首次出现之间的时间序列进行子集化?
tribble(
~t, ~x, ~y,
as.POSIXct(strptime("2011-03-27 01:30:00", "%Y-%m-%d %H:%M:%S")), -1, 1,
as.POSIXct(strptime("2011-03-27 01:30:01", "%Y-%m-%d %H:%M:%S")), -5, 2,
as.POSIXct(strptime("2011-03-27 03:45:00", "%Y-%m-%d %H:%M:%S")), -3, 5,
as.POSIXct(strptime("2011-03-27 04:20:00", "%Y-%m-%d %H:%M:%S")), -8, 3,
as.POSIXct(strptime("2011-03-27 04:25:00", "%Y-%m-%d %H:%M:%S")), -2, 8
)
For example all rows from start to first occurrence of y > 4
(expecting the first three rows of the sample data). 例如,从开始到第一次出现的所有行y > 4
(期望样本数据的前三行)。
h3rm4ns Solution explained h3rm4ns解决方案说明
simpler case of not including the first row to match the condition would be: 不包括第一行以匹配条件的更简单的情况是:
%>% filter(cumsum(y > 4) == 0)
y > 4
will be false which is equal to 0
in R, so the cumsum == 0
will return TRUE
(and thus filter) for all rows up to the first one that matches y > 4
and therefore adds a 1
to the sum. y > 4
将为false,在R中等于0
,因此cumsum == 0
将对所有与y > 4
匹配的行返回TRUE
(并进行过滤),因此将总cumsum == 0
1
。
To have it include the matching row, we additionally lag(y, default = 0)
. 为了让它包含匹配的行,我们另外lag(y, default = 0)
。
You can do the following: 您可以执行以下操作:
df %>% filter(!cumsum(lag(y, default = 0) > 4))
The result: 结果:
# A tibble: 3 x 3
t x y
<dttm> <dbl> <dbl>
1 2011-03-27 01:30:00 -1 1
2 2011-03-27 01:30:01 -5 2
3 2011-03-27 03:45:00 -3 5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.