简体   繁体   English

使用dplyr按组计算每个块?

[英]calculate each chunk by group using dplyr?

How can I get the expected calculation using dplyr package? 如何使用dplyr软件包获得预期的计算?

row value   group   expected
1   2       1       =NA
2   4       1       =4-2
3   5       1       =5-4
4   6       2       =NA
5   11      2       =11-6
6   12      1       =NA
7   15      1       =15-12

I tried 我试过了

df=read.table(header=1, text='    row    value  group
1   2   1
2   4   1
3   5   1
4   6   2
5   11  2
6   12  1
7   15  1')

df %>% group_by(group) %>% mutate(expected=value-lag(value))

How can I calculate for each chunk (row 1-3, 4-5, 6-7) although row 1-3 and 6-7 are labelled as the same group number? 尽管行1-3和6-7被标记为相同的组号,我如何为每个块(行1-3、4-5、6-7)计算?

As your group variable is not useful for this, create a new variable aux and use it as the grouping variable: 由于您的group变量对此无用,因此请创建一个新变量aux并将其用作分组变量:

library(dplyr)
df$aux <- rep(seq_along(rle(df$group)$values), times = rle(df$group)$lengths)

df %>% group_by(aux) %>% mutate(expected = value - lag(value))

Source: local data frame [7 x 5]
Groups: aux

  row value group aux expected
1   1     2     1   1       NA
2   2     4     1   1        2
3   3     5     1   1        1
4   4     6     2   2       NA
5   5    11     2   2        5
6   6    12     1   3       NA
7   7    15     1   3        3

Here is a similar approach. 这是一种类似的方法。 I created a new group variable using cumsum . 我使用cumsum创建了一个新的组变量。 Whenever the difference between two numbers in group is not 0, R assigns a new group number. 只要group两个数字之间的差不为0,R就会分配一个新的组号。 If you have more data, this approach may be helpful. 如果您有更多数据,此方法可能会有所帮助。

library(dplyr)

mutate(df, foo = cumsum(c(T, diff(group) != 0))) %>%
group_by(foo) %>%
mutate(out = value - lag(value))

#  row value group foo out
#1   1     2     1   1  NA
#2   2     4     1   1   2
#3   3     5     1   1   1
#4   4     6     2   2  NA
#5   5    11     2   2   5
#6   6    12     1   3  NA
#7   7    15     1   3   3

Here is an option using data.table_1.9.5 . 这是使用data.table_1.9.5的选项。 The devel version introduced new functions rleid and shift (default type is "lag" and fill is "NA") that can be useful for this. 开发版本引入了新功能rleidshift (默认type为“ lag”, fill为“ NA”)对此可能有用。

library(data.table)
setDT(df)[, expected:=value-shift(value) ,by = rleid(group)][]
#     row value group expected
#1:   1     2     1       NA
#2:   2     4     1        2
#3:   3     5     1        1
#4:   4     6     2       NA
#5:   5    11     2        5
#6:   6    12     1       NA
#7:   7    15     1        3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM