[英]R- Conditional calculation based on values in other row and column
我的数据具有以下格式:-第一列:指示计算机是否正在运行-第二列:计算机运行的总时间
请参阅下面的数据集:
structure(c("", "running", "running", "running", "", "", "",
"running", "running", "", "10", "15", "30", "2", "5", "17", "47",
"12", "57", "87"), .Dim = c(10L, 2L), .Dimnames = list(NULL,
c("c", "v")))
我想添加第三列,以给出计算机已运行的总时间(通过添加自计算机开始运行以来的所有时间)。 请参阅以下所需的输出:
[1,] "" "10" "0"
[2,] "running" "15" "15"
[3,] "running" "30" "45"
[4,] "running" "2" "47"
[5,] "" "5" "0"
[6,] "" "17" "0"
[7,] "" "47" "0"
[8,] "running" "12" "12"
[9,] "running" "57" "69"
[10,] "" "87" "0"
我试图用R编写一些代码来以一种优雅的方式来获得它,但是目前我的编程技能太有限了。 有谁知道解决这个问题的方法吗? 预先谢谢您!
首先,我们将您的数据转换为可以包含混合数据类型的更合适的数据结构:
m <- structure(c("", "running", "running", "running", "", "", "",
"running", "running", "", "10", "15", "30", "2", "5", "17", "47",
"12", "57", "87"), .Dim = c(10L, 2L), .Dimnames = list(NULL,
c("c", "v")))
DF <- as.data.frame(m, stringsAsFactors = FALSE)
DF[] <- lapply(DF, type.convert, as.is = TRUE)
然后,我们可以使用package data.table轻松地做到这一点:
library(data.table)
setDT(DF)
DF[, total := cumsum(v), by = rleid(c)]
DF[c == "", total := 0]
# c v total
# 1: 10 0
# 2: running 15 15
# 3: running 30 45
# 4: running 2 47
# 5: 5 0
# 6: 17 0
# 7: 47 0
# 8: running 12 12
# 9: running 57 69
#10: 87 0
这是使用基数R的简单解决方案:
DF$total <- ave(DF$v, DF$c, cumsum(DF$c == ""), FUN = cumsum)
DF$total[DF$c == ""] <- 0
> DF
c v total
1 10 0
2 running 15 15
3 running 30 45
4 running 2 47
5 5 0
6 17 0
7 47 0
8 running 12 12
9 running 57 69
10 87 0
我们可以使用dplyr
library(dplyr)
DF %>%
group_by(cumsum(c==''),c) %>%
mutate(total=replace(cumsum(v), c=='', 0) )
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.