在 R 中过滤具有重复行的数据集

Question

I need to filter a dataset based on two conditions.我需要根据两个条件过滤数据集。

Here is how my dataset looks like:这是我的数据集的样子：

df <- data.frame(
  id = c(1,2,2,3,3,4,5,5),
  district = c(10,10,11,12,12,13,14,15),
  value = c(10.2, 10.8, 10.8, 7.5, 9.3, 6, 7.0, 7.0))


> df
  id district value
1  1       10  10.2
2  2       10  10.8
3  2       11  10.8
4  3       12   7.5
5  3       12   9.3
6  4       13   6.0
7  5       14   7.0
8  5       15   7.0

I have duplicated rows based on id s.我有基于id的重复行。 In order to keep the desired row, First id s having the multiple districts but the same value , I need to keep the first row: Second id s having multiple value s, but from the same district, I need the max of value row.为了保留所需的行， First id s 具有多个区但value相同，我需要保留第一行： Second id s 具有多个value s，但来自同一区，我需要值行的max 。

SO the desired filtered dataset is:所以所需的过滤数据集是：

> df
  id district value
1  1       10  10.2
2  2       10  10.8
3  3       12   9.3
4  4       13   6.0
5  5       14   7.0

I was able to locate the duplicated ids only up until now.到目前为止，我只能找到重复的 ID。

df[duplicated(df$id),]

Does anyone have any ideas?有没有人有任何想法？ Thanks谢谢

Answer 1

With dplyr :使用dplyr ：

df %>% 
  group_by(id) %>%
  arrange(desc(value)) %>%
  slice(1)
# # A tibble: 5 x 3
# # Groups:   id [5]
#      id district value
#   <dbl>    <dbl> <dbl>
# 1     1       10  10.2
# 2     2       10  10.8
# 3     3       12   9.3
# 4     4       13   6  
# 5     5       14   7

There's no real need to distinguish between the max value if there are multiple values and keeping the first value if there are duplicates - if we order the data descending by value and keep the first row in each id group, it accomplishes both of those tasks with one logic.如果有多个值，则没有真正需要区分最大值，如果有重复，则保留第一个值 - 如果我们按value对数据进行降序排序并保留每个id组中的第一行，则它完成了这两项任务一种逻辑。

Answer 2

library(dplyr)

df %>%
  arrange(id, -value) %>%
  distinct(id, district, .keep_all = TRUE) %>%
  distinct(id, value, .keep_all = TRUE)

      id district value
1  1       10  10.2
2  2       10  10.8
3  3       12   9.3
4  4       13   6.0
5  5       14   7.0

First we sort descending by value, then we use the distinct function to look for unique combinations.首先我们按值降序排序，然后我们使用distinct函数寻找唯一的组合。

Answer 3

In base R , we can use duplicated after order ing the rows在base R ，我们可以在对行进行order后使用duplicated的

df1 <- df[order(df$id, -df$value),]
df1[!duplicated(df1$id),]
#  id district value
#1  1       10  10.2
#2  2       10  10.8
#5  3       12   9.3
#6  4       13   6.0
#7  5       14   7.0

在 R 中过滤具有重复行的数据集

问题描述

3 个解决方案

解决方案1
3 已采纳 2020-11-02 18:12:53

解决方案2
1 2020-11-02 18:15:21

解决方案3
1 2020-11-02 21:15:03

在 R 中过滤具有重复行的数据集

问题描述

3 个解决方案

解决方案1 3 已采纳 2020-11-02 18:12:53

解决方案2 1 2020-11-02 18:15:21

解决方案3 1 2020-11-02 21:15:03

解决方案1
3 已采纳 2020-11-02 18:12:53

解决方案2
1 2020-11-02 18:15:21

解决方案3
1 2020-11-02 21:15:03