R data.table 在组之间有条件地删除行

Question

I have this example dataset and the actual has millions of rows, so I'd appreciate a data.table solution but also a tidyverse solution would be fine:我有这个示例数据集，实际有数百万行，所以我很欣赏data.table解决方案，但也可以使用tidyverse解决方案：

dat1 = data.frame(name = c("X1", "X1", "X1", "X2", "X2", "X2", "X2", "X2", "X2"), 
              year = c(2015,2016,2017,2015,2016,2016,2017,2017, 2018),
              choice = c("o","o","o","o","o","r","r","o","o")
)
dat1

The logic I need to apply is:我需要应用的逻辑是：

If for any name and year combination only choice "o" exists, retain the row with "o" .如果对于任何名称和年份组合，只有选择"o"存在，则保留带有"o"的行。

If for any name and year combination choices "o" and "r" exist, retain row with "r" and drop row with "o" .如果存在任何名称和年份组合选项"o"和"r" ，则使用"r"保留行并使用"o"删除行。 I don't want to name name and year combinations.我不想命名name和year组合。

Answer 1

Does this work:这是否有效：

library(dplyr)
dat1 %>% group_by(name ,year) %>% filter(all(choice == 'o' )|choice == 'r')
# A tibble: 7 x 3
# Groups:   name, year [7]
  name   year choice
  <chr> <dbl> <chr> 
1 X1     2015 o     
2 X1     2016 o     
3 X1     2017 o     
4 X2     2015 o     
5 X2     2016 r     
6 X2     2017 r     
7 X2     2018 o

Answer 2

library(data.table)
setDT(dat1)
dat1[, .SD[all(choice == "o") | choice == "r",], by = .(name, year)]
#    name year choice
# 1:   X1 2015      o
# 2:   X1 2016      o
# 3:   X1 2017      o
# 4:   X2 2015      o
# 5:   X2 2016      r
# 6:   X2 2017      r
# 7:   X2 2018      o

(I generated this before looking at KarthikS's answer, but the logic and the results are identical.) （我在查看 KarthikS 的答案之前生成了这个，但逻辑和结果是相同的。）

Answer 3

An option is also to convert the column to factor with levels specified in the custom order and then select the first levels after dropping the levels with droplevels一个选项还是将列转换为具有自定义顺序中指定levels的factor ，然后在使用droplevels删除级别后 select first levels

library(dplyr)
dat1 %>%
     group_by(name, year) %>%
     filter(choice %in% levels(droplevels(factor(choice, 
           levels = c('r', 'o'))))[1])
# A tibble: 7 x 3
# Groups:   name, year [7]
#  name   year choice
#  <chr> <dbl> <chr> 
#1 X1     2015 o     
#2 X1     2016 o     
#3 X1     2017 o     
#4 X2     2015 o     
#5 X2     2016 r     
#6 X2     2017 r     
#7 X2     2018 o

An equivalent option with data.table is data.table的等效选项是

library(data.table)
setDT(dat1)[dat1[, .I[choice %in% 
       levels(droplevels(factor(choice, 
           levels = c('r', 'o'))))[1]], .(name, year)]$V1]

R data.table 在组之间有条件地删除行

问题描述

3 个解决方案

解决方案1
3 2020-11-25 13:27:56

解决方案2
3 已采纳 2020-11-25 13:40:29

解决方案3
0 2020-11-25 21:14:41

R data.table 在组之间有条件地删除行

问题描述

3 个解决方案

解决方案1 3 2020-11-25 13:27:56

解决方案2 3 已采纳 2020-11-25 13:40:29

解决方案3 0 2020-11-25 21:14:41

解决方案1
3 2020-11-25 13:27:56

解决方案2
3 已采纳 2020-11-25 13:40:29

解决方案3
0 2020-11-25 21:14:41