简体   繁体   English

R data.table 按连续值分组

[英]R data.table group by continuous values

I need some help with grouping data by continuous values.我需要一些帮助来按连续值对数据进行分组。

If I have this data.table如果我有这个 data.table

dt <- data.table::data.table( a = c(1,1,1,2,2,2,2,1,1,2), b = seq(1:10), c = seq(1:10)+1 )
 
    a  b  c
 1: 1  1  2
 2: 1  2  3
 3: 1  3  4
 4: 2  4  5
 5: 2  5  6
 6: 2  6  7
 7: 2  7  8
 8: 1  8  9
 9: 1  9 10
10: 2 10 11

I need a group for every following equal values in column a.我需要为 a 列中的每个以下相等值创建一个组。 Of this group i need the first (also min possible) value of column b and the last (also max possible) value of column c.在这组中,我需要 b 列的第一个(也是最小可能)值和 c 列的最后一个(也是最大可能)值。

Like this:像这样:

   a  b  c
1: 1  1  4
2: 2  4  8
3: 1  8 10
4: 2 10 11

Thank you very much for your help.非常感谢您的帮助。 I do not get it solved alone.我不是一个人解决的。

Probably we can try也许我们可以试试

> dt[, .(a = a[1], b = b[1], c = c[.N]), rleid(a)][, -1]
   a  b  c
1: 1  1  4
2: 2  4  8
3: 1  8 10
4: 2 10 11

An option with dplyr dplyr选项

library(dplyr)
dt %>% 
  group_by(grp = cumsum(c(TRUE, diff(a) != 0))) %>%
    summarise(across(a:b, first), c = last(c)) %>%
  select(-grp)

-output -输出

# A tibble: 4 × 3
      a     b     c
  <dbl> <int> <dbl>
1     1     1     4
2     2     4     8
3     1     8    10
4     2    10    11

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM