尝试根据其他两个列的唯一值保留列的值

Question

I want to keep only the 2 largest values in a column of a df according to the unique pair of values in two other columns. 我想根据另外两列中的唯一值对将df列中的2个最大值保留下来。 eg, I have this df: 例如，我有这个df：

df <- data.frame('ID' = c(1,1,1,2,2,3,4,4,4,5),
                 'YEAR' = c(2002,2002,2003,2002,2003,2005,2010,2011,2012,2008),
                 'WAGES' = c(100,98,60,120,80,300,50,40,30,500));

And I want to drop the 3rd and 9th rows, equivalently, keep the first two largest values in WAGES column. 我想等效地删除第三行和第9行，将前两个最大值保留在WAGES列中。 The df has roughly 300,000 rows. df大约有300,000行。

Answer 1

You can use dplyr 's top_n : 您可以使用dplyr的top_n ：

library(dplyr)

df %>% 
  group_by(ID) %>% 
  top_n(n = 2, wt = WAGES) 

## A tibble: 8 x 3
## Groups:   ID [5]
#     ID  YEAR WAGES
#  <dbl> <dbl> <dbl>
#1     1  2001   100
#2     1  2002    98
#3     2  2002   120
#4     2  2003    80
#5     3  2005   300
#6     4  2010    50
#7     4  2011    40
#8     5  2008   500

Answer 2

If I understood your question correctly, using base R: 如果我正确理解了您的问题，请使用基数R：

for (i in 1:2) {
    max_row <- which.max(df$WAGES)
    df <- df[-c(max_row), ]
}

df

#   ID YEAR WAGES
# 1  1 2001   100
# 2  1 2002    98
# 3  1 2003    60
# 4  2 2002   120
# 5  2 2003    80
# 7  4 2010    50
# 8  4 2011    40
# 9  4 2012    30

Note - and , in df <- df[-c(max_row), ] . 注意-和,在df <- df[-c(max_row), ] 。

尝试根据其他两个列的唯一值保留列的值

问题描述

2 个解决方案

解决方案1
0 已采纳 2018-06-20 13:44:22

解决方案2
0 2018-06-20 13:48:26

尝试根据其他两个列的唯一值保留列的值

问题描述

2 个解决方案

解决方案1 0 已采纳 2018-06-20 13:44:22

解决方案2 0 2018-06-20 13:48:26

解决方案1
0 已采纳 2018-06-20 13:44:22

解决方案2
0 2018-06-20 13:48:26