根据第二列删除重复项

Question

I am trying to write a section of code that does a few things: 1) group dataset by ID 2) count the number of unique months in column data.month 3) remove all IDs that have less than 9 months 4) print distinct IDs based on the company (ie print ID twice if related to 2 companies) 5) remove duplicated ID and keep the record that has the highest data.month number. 我正在尝试编写一段代码来执行一些操作：1）按ID对数据集进行分组2）计算列数据中唯一月的数量.month 3）删除少于9个月的所有ID 4）打印不同的ID基于公司（即，如果与2个公司相关，则打印ID两次）5）删除重复的ID，并保留data.month编号最高的记录。

I have the code working until 5). 我的代码可以工作到5）。 I cant get my code to only print the record (row) of duplicate IDs that has the highest month number. 我无法获得仅打印月份号最高的重复ID的记录（行）的代码。

I looked at a few examples here: 我在这里看了几个例子：

R remove duplicates based on other columns R根据其他列删除重复项

Remove duplicates based on 2nd column condition 根据第二列条件删除重复项

I can figure out how to remove duplicates, but I'm having trouble applying it to my circumstances. 我可以弄清楚如何删除重复项，但是在将其应用于我的环境时遇到了麻烦。

This is the two codes I have tried to achieve my goal: 这是我尝试实现的两个代码：

data.check6 <- bind %>%
group_by(bind$ABN) %>%
summarise(count = n_distinct(data.month)) %>%
filter(count>8) %>%
rrange(bind$data.month) %>%
filter(row_number() == 1)

and: 和：

 library(tidyverse)

 data.check7 <- bind %>%
  group_by(ABN)%>%      
  filter(1 == length(unique(bind$data.month)), !duplicated(bind$data.month))

Right now, I get the error: 现在，我得到了错误：

Error in arrange_impl(.data, dots) : incorrect size (345343) at position 1, expecting : 3749 range_impl（.data，dots）中的错误：位置1处的大小不正确（345343），预期值：3749

In the end I would like to have a dataset where each ID only appears once and it is the ID record associated with the highest month (ie. column value = 12) 最后，我希望有一个数据集，其中每个ID仅出现一次，并且是与最高月份关联的ID记录（即列值= 12）

Answer 1

I think you're looking for something like that: 我认为您正在寻找类似的东西：

Example data: 示例数据：

> bind <- data.frame(ABN = rep(1:3, 3),
+                    data.month = sample(1:12, 9),
+                    other.inf = runif(9))
> 
> bind
  ABN data.month other.inf
1   1         10 0.8102867
2   2          4 0.2919716
3   3          8 0.3391790
4   1          2 0.3698933
5   2          6 0.9155280
6   3          1 0.2680165
7   1          9 0.7541168
8   2          7 0.2018796
9   3         11 0.1546079

Solution: 解：

> bind %>%
+   group_by(ABN) %>%      
+   filter(data.month == max(data.month))
# A tibble: 3 x 3
# Groups:   ABN [3]
    ABN data.month other.inf
  <int>      <int>     <dbl>
1     1         10     0.810
2     2          7     0.202
3     3         11     0.155

根据第二列删除重复项

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-05-23 08:20:05

根据第二列删除重复项

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-05-23 08:20:05

解决方案1
1 已采纳 2019-05-23 08:20:05