[英]R data.table remove rows conditionally among groups
I have this example dataset and the actual has millions of rows, so I'd appreciate a data.table
solution but also a tidyverse
solution would be fine:我有这个示例数据集,实际有数百万行,所以我很欣赏data.table
解决方案,但也可以使用tidyverse
解决方案:
dat1 = data.frame(name = c("X1", "X1", "X1", "X2", "X2", "X2", "X2", "X2", "X2"),
year = c(2015,2016,2017,2015,2016,2016,2017,2017, 2018),
choice = c("o","o","o","o","o","r","r","o","o")
)
dat1
The logic I need to apply is:我需要应用的逻辑是:
If for any name and year combination only choice "o"
exists, retain the row with "o"
.如果对于任何名称和年份组合,只有选择"o"
存在,则保留带有"o"
的行。
If for any name and year combination choices "o"
and "r"
exist, retain row with "r"
and drop row with "o"
.如果存在任何名称和年份组合选项"o"
和"r"
,则使用"r"
保留行并使用"o"
删除行。 I don't want to name name
and year
combinations.我不想命名name
和year
组合。
Does this work:这是否有效:
library(dplyr)
dat1 %>% group_by(name ,year) %>% filter(all(choice == 'o' )|choice == 'r')
# A tibble: 7 x 3
# Groups: name, year [7]
name year choice
<chr> <dbl> <chr>
1 X1 2015 o
2 X1 2016 o
3 X1 2017 o
4 X2 2015 o
5 X2 2016 r
6 X2 2017 r
7 X2 2018 o
library(data.table)
setDT(dat1)
dat1[, .SD[all(choice == "o") | choice == "r",], by = .(name, year)]
# name year choice
# 1: X1 2015 o
# 2: X1 2016 o
# 3: X1 2017 o
# 4: X2 2015 o
# 5: X2 2016 r
# 6: X2 2017 r
# 7: X2 2018 o
(I generated this before looking at KarthikS's answer, but the logic and the results are identical.) (我在查看 KarthikS 的答案之前生成了这个,但逻辑和结果是相同的。)
An option is also to convert the column to factor
with levels
specified in the custom order and then select the first
levels
after dropping the levels with droplevels
一个选项还是将列转换为具有自定义顺序中指定levels
的factor
,然后在使用droplevels
删除级别后 select first
levels
library(dplyr)
dat1 %>%
group_by(name, year) %>%
filter(choice %in% levels(droplevels(factor(choice,
levels = c('r', 'o'))))[1])
# A tibble: 7 x 3
# Groups: name, year [7]
# name year choice
# <chr> <dbl> <chr>
#1 X1 2015 o
#2 X1 2016 o
#3 X1 2017 o
#4 X2 2015 o
#5 X2 2016 r
#6 X2 2017 r
#7 X2 2018 o
An equivalent option with data.table
is data.table
的等效选项是
library(data.table)
setDT(dat1)[dat1[, .I[choice %in%
levels(droplevels(factor(choice,
levels = c('r', 'o'))))[1]], .(name, year)]$V1]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.