简体   繁体   English

循环总结观察大于R中的主题

[英]Loop to sum up observation greater than subject in R

I have a data set that looks like that 我有一个看起来像这样的数据集

set.seed(100)
da <- data.frame(exp = c(rep("A", 4), rep("B", 4)), diam = runif(8, 10, 30))

For each row in the data set I want to sum up observations (diam) that are bigger than the diam in the specific row and are included in a level "exp". 对于数据集中的每一行,我想总结比特定行中的diam更大的观察值(diam),并包含在“exp”级别中。 To do that I made a loop: 为此,我做了一个循环:

da$d2 <- 0
for (i in 1:length(da$exp)){
 for (j in 1:length(da$exp)){
  if (da$diam[i] < da$diam[j] & da$exp[i] == da$exp[j]){
    da$d2[i] = da$d2[i] + da$diam[j]}
}
}

The lopp works fine and I got results lopp工作正常,我得到了结果

  exp     diam       d2
1   A 16.15532 21.04645
2   A 15.15345 37.20177
3   A 21.04645  0.00000
4   A 11.12766 52.35522
5   B 19.37099 45.92347
6   B 19.67541 26.24805
7   B 26.24805  0.00000
8   B 17.40641 65.29445

However, my real data set is much bigger than that (> 40000 rows and >100 exp levels) so the loop goes very slow. 但是,我的实际数据集远远大于(> 40000行和> 100 exp级别),因此循环变得非常慢。 I hope it is possible to use some function to facilitate calculations. 我希望可以使用一些函数来促进计算。

If you don't require the initial order in the result you could do it quite efficiently like this: 如果您不需要结果中的初始订单,您可以非常有效地执行此操作:

library(data.table)
setorder(setDT(da), exp, -diam)
da[, d2 := cumsum(diam) - diam, by = exp]

da
#   exp     diam       d2
#1:   A 21.04645  0.00000
#2:   A 16.15532 21.04645
#3:   A 15.15345 37.20177
#4:   A 11.12766 52.35522
#5:   B 26.24805  0.00000
#6:   B 19.67541 26.24805
#7:   B 19.37099 45.92347
#8:   B 17.40641 65.29445

Using dplyr, that would be: 使用dplyr,那将是:

library(dplyr)
da %>%
  arrange(exp, desc(diam)) %>%
  group_by(exp) %>%
  mutate(d2 = cumsum(diam) - diam)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM