简体   繁体   English

过滤具有不匹配变量值的重复行 .in R

[英]filter duplicated rows that has nonmatching variable values .in R

I am trying to filter some rows that have duplicated and I need the non-matching duplicates to filter.我正在尝试过滤一些重复的行,我需要过滤不匹配的重复项。

Here is the sample dataset.这是示例数据集。

df <- data.frame(
         id =  c(1,2,2,3,4,5,5,6),
         cat = c(3,3,4,5,2,2,1,5),
  actual.cat = c(3,4,4,5,2,1,1,7))

> df
  id cat    actual.cat
1  1   3          3
2  2   3          4
3  2   4          4
4  3   5          5
5  4   2          2
6  5   2          1
7  5   1          1
8  6   5          7

So, each id has cat and actual.cat .所以,每个 id 都有catactual.cat When there is a duplicated id , I need to filter the nonmatching row.当有重复的id ,我需要过滤不匹配的行。

Here what I need.这里有我需要的。

> df
  id cat     actual.cat
  1   3          3
  2   3          4
  3   5          5
  4   2          2
  5   2          1
  6   5          7

Any ideas on this?对此有何想法?

Thanks!谢谢!

We can do a group by operation and filter我们可以通过操作和filter进行分组

library(dplyr)
df %>% 
     group_by(id) %>%
     filter(n() > 1 & cat != actual.cat|n() == 1)

-output -输出

# A tibble: 6 x 3
# Groups:   id [6]
#     id   cat actual.cat
#  <dbl> <dbl>      <dbl>
#1     1     3          3
#2     2     3          4
#3     3     5          5
#4     4     2          2
#5     5     2          1
#6     6     5          7

Or using base R或使用base R

subset(df, id %in% names(which(table(id) > 1)) & 
     cat != actual.cat| id %in% names(which(table(id) == 1)))

In base R, you can use subset with ave to select rows in each id where number of rows in each group is 1 or cat is not equal to actual.cat .在基础 R 中,您可以使用带有ave subset来选择每个id中的行,其中每组中的行数为 1 或cat不等于actual.cat

subset(df, ave(cat != actual.cat, id, FUN = function(x) length(x) == 1 | x))

#  id cat actual.cat
#1  1   3          3
#2  2   3          4
#4  3   5          5
#5  4   2          2
#6  5   2          1
#8  6   5          7

You can also write this logic in data.table :您还可以在data.table编写此逻辑:

library(data.table)
setDT(df)[, .SD[.N == 1 | cat != actual.cat], id]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM