简体   繁体   English

data.table 中的子集使用两列的条件

[英]Subset groups in a data.table using conditions on two columns

I have a data.table with a high number of groups.我有一个 data.table,有很多组。 I would like to subset whole groups (not just rows) based on the conditions on multiple columns.我想根据多列的条件对整个组(而不仅仅是行)进行子集化。 Consider the following data.table:考虑以下 data.table:

DT <- structure(list(id = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L), 
                 group = c("A", "A", "A", "A", "B", "B", "B", "B", "C", "C", "C", "C"), 
                 y = c(14, 19, 16, 10, 6, 8, 14, 19, 10, 9, 6, 8), 
                 x = c(3, 3, 2, 3, 3, 3, 3, 2, 2, 3, 3, 3)), 
            row.names = c(NA, -12L),
            class = c("data.table", "data.frame"))
>DT

    id group  y x
 1:  1     A 14 3
 2:  2     A 19 3
 3:  3     A 16 2
 4:  4     A 10 3
 5:  5     B  6 3
 6:  6     B  8 3
 7:  7     B 14 3
 8:  8     B 19 2
 9:  9     C 10 2
10: 10     C  9 3
11: 11     C  6 3
12: 12     C  8 3

I would like to keep groups that have y=6 and x=3 in the same row.我想将y=6x=3的组放在同一行中。 So that I would have only class B and C (preferably using data.table package in R):这样我就只有 class B 和 C(最好在 R 中使用data.table package):

    id group  y x
 1:  5     B  6 3
 2:  6     B  8 3
 3:  7     B 14 3
 4:  8     B 19 2
 5:  9     C 10 2
 6: 10     C  9 3
 7: 11     C  6 3
 8: 12     C  8 3

All my attempts gave me only those rows containing y=6 and x=3 , which I do not want:我所有的尝试只给了我那些包含y=6x=3的行,这是我不想要的:

    id group  y x
 1:  5     B  6 3
 2: 11     C  6 3

With data.table :使用data.table

DT[,.SD[any(x == 3 & y == 6)], by=group]

    group    id     y     x
   <char> <int> <num> <num>
1:      B     5     6     3
2:      B     6     8     3
3:      B     7    14     3
4:      B     8    19     2
5:      C     9    10     2
6:      C    10     9     3
7:      C    11     6     3
8:      C    12     8     3

Another possibly faster option:另一个可能更快的选择:

DT[, if (any(x == 3 & y == 6)) .SD, by=group]

Try dplyr package试试 dplyr package

#select groups containing y and x
groups = DT %>% filter(y == 6, x == 3) %>% select(group) %>% unique() %>% unlist() %>% as.vector()   

# filter for selected groups
DT %>% filter(group %in% groups)

A data.table option data.table选项

> DT[group %in% DT[.(3, 6), group, on = .(x, y)]]
   id group  y x
1:  5     B  6 3
2:  6     B  8 3
3:  7     B 14 3
4:  8     B 19 2
5:  9     C 10 2
6: 10     C  9 3
7: 11     C  6 3
8: 12     C  8 3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM