[英]find start and end idx of a time series by group in a data table
I have data.table that looks like this:我有 data.table 看起来像这样:
data <- data.table(time = c(0, 1, 2, 3, 4, 5, 6, 7),
anom = c(0, 0, 1, 1, 1, 0, 0, 0),
gier = c(0, 0, 4, 9, 7, 0, 0, 0))
Now I am calculating some statistical values of the column gier
grouped by column anom
like this:现在我正在计算按列anom
分组的gier
列的一些统计值,如下所示:
cols <- c("gier")
statFun <- function(x) list(mean = mean(x), median = median(x), std = sd(x))
statSum <- data[, unlist(lapply(.SD, statFun), recursive = FALSE), .SDcols = cols, by = anom]
This is fine but I want to go a step further and put in the start and end points of time
depending on the start and of the anom
groups (0 and 1).这很好,但我想anom
更进一步,并根据异常组(0 和 1)的开始和结束time
输入开始和结束时间点。 So in the end I have something like a new time series but only with the start and end points of time
.所以最后我有一个新的时间序列,但只有time
的开始和结束点。 So in the end the result should look like this:所以最后的结果应该是这样的:
res <- data.table(x.start = c(0, 2, 5),
x.end = c(1, 4, 7),
anom = c(0, 1, 0),
gier.mean = c(0, 6.666, 0),
gier.median = c(0, 7, 0),
gier.std = c(0, 2.516, 0))
How is it possible to achieve this?怎么可能做到这一点?
addition: is there a way to achieve the result for multiple columns and not only one column like gier
?另外:有没有办法实现多列的结果,而不是像gier
这样的只有一列的结果? For example I am able to do this but I don't know how to extend it with the mentioned columns.例如,我能够做到这一点,但我不知道如何使用提到的列来扩展它。 This way there is at least an extra column rn
for the column names I calculate the statistical values.这样我计算统计值的列名至少多了一个列rn
。
res <- data[, setDT(do.call(rbind.data.frame, lapply(.SD, statFun)), keep.rownames = TRUE), .SDcols = cols, by = anom]
You can include additional calculation outside lapply
:您可以在lapply
之外包括额外的计算:
library(data.table)
data[, unlist(c(lapply(.SD, statFun),
anom = first(anom), x.start = first(time), x.end = last(time)),
recursive = FALSE), rleid(anom), .SDcols = cols]
# rleid gier.mean gier.median gier.std anom x.start x.end
#1: 1 0.000000 0 0.000000 0 0 1
#2: 2 6.666667 7 2.516611 1 2 4
#3: 3 0.000000 0 0.000000 0 5 7
In dplyr
we can do this similarly:在dplyr
,我们可以类似地这样做:
library(dplyr)
data %>%
group_by(grp = rleid(anom)) %>%
summarise(across(cols, list(mean = mean, median = median, std = sd)),
x.start = first(time),
x.end = last(time))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.