简体   繁体   English

如何使用R和dplyr中连续的元素执行group_by

[英]How to perform a group_by with elements that are contiguous in R and dplyr

Suppose we have this tibble: 假设我们有这个小标题:

 group item
 x     1
 x     2
 x     2
 y     3
 z     2
 x     2
 x     2
 z     1

I want to perform a group_by by group. 我想按组执行group_by。 However, I'd rather group only by the elements that are adjacent. 但是,我只希望按相邻的元素分组。 For example, in my case, I'd have three 'x' groups, summing 'item' elements. 例如,在我的情况下,我将有三个“ x”组,将“ item”元素相加。 The result would be something like: 结果将是这样的:

group item
x 5
y 3
z 2
x 4
z 1

I know how to solve this problem using 'for' loops. 我知道如何使用“ for”循环解决此问题。 However, this is not fast and doesn't sound straightforward. 但是,这并不快,听起来也不是那么简单。 I'd rather use some dplyr or tidyverse function with an easy logic. 我宁愿使用带有简单逻辑的dplyr或tidyverse函数。

This question is not duplicated. 这个问题没有重复。 I know there's already a question about rle in SO, but my question was more general than that. 我知道在SO中已经存在关于rle的问题,但是我的问题比这更笼统。 I asked for general solutions. 我要求一般解决方案。

If you want to use only base R + tidyverse, this code exactly replicates your desired results 如果您只想使用基本R + tidyverse,则此代码可精确复制您想要的结果

mydf <- tibble(group = c("x", "x", "x", "y", "z", "x", "x", "z"), 
                   item = c(1, 2, 2, 3, 2, 2, 2, 1))

mydf

# A tibble: 8 × 2
  group  item
  <chr> <dbl>
1     x     1
2     x     2
3     x     2
4     y     3
5     z     2
6     x     2
7     x     2
8     z     1

runs <- rle(mydf$group)

mydf %>% 
  mutate(run_id = rep(seq_along(runs$lengths), runs$lengths)) %>% 
  group_by(group, run_id) %>% 
  summarise(item = sum(item)) %>% 
  arrange(run_id) %>% 
  select(-run_id) 

Source: local data frame [5 x 2]
Groups: group [3]

  group  item
  <chr> <dbl>
1     x     5
2     y     3
3     z     2
4     x     4
5     z     1

You can construct group identifiers with rle , but the easier route is to just use data.table::rleid , which does it for you: 你可以建立群组标识与rle ,但更容易的途径是只使用data.table::rleid ,这会为你:

library(dplyr)

df %>% 
    group_by(group, 
             group_run = data.table::rleid(group)) %>% 
    summarise_all(sum)
#> # A tibble: 5 x 3
#> # Groups:   group [?]
#>    group group_run  item
#>   <fctr>     <int> <int>
#> 1      x         1     5
#> 2      x         4     4
#> 3      y         2     3
#> 4      z         3     2
#> 5      z         5     1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM