简体   繁体   English

尝试根据其他两个列的唯一值保留列的值

[英]Trying to keep values of a column based on the unique values of two other columns

I want to keep only the 2 largest values in a column of a df according to the unique pair of values in two other columns. 我想根据另外两列中的唯一值对将df列中的2个最大值保留下来。 eg, I have this df: 例如,我有这个df:

df <- data.frame('ID' = c(1,1,1,2,2,3,4,4,4,5),
                 'YEAR' = c(2002,2002,2003,2002,2003,2005,2010,2011,2012,2008),
                 'WAGES' = c(100,98,60,120,80,300,50,40,30,500));

And I want to drop the 3rd and 9th rows, equivalently, keep the first two largest values in WAGES column. 我想等效地删除第三行和第9行,将前两个最大值保留在WAGES列中。 The df has roughly 300,000 rows. df大约有300,000行。

You can use dplyr 's top_n : 您可以使用dplyrtop_n

library(dplyr)

df %>% 
  group_by(ID) %>% 
  top_n(n = 2, wt = WAGES) 

## A tibble: 8 x 3
## Groups:   ID [5]
#     ID  YEAR WAGES
#  <dbl> <dbl> <dbl>
#1     1  2001   100
#2     1  2002    98
#3     2  2002   120
#4     2  2003    80
#5     3  2005   300
#6     4  2010    50
#7     4  2011    40
#8     5  2008   500

If I understood your question correctly, using base R: 如果我正确理解了您的问题,请使用基数R:

for (i in 1:2) {
    max_row <- which.max(df$WAGES)
    df <- df[-c(max_row), ]
}

df

#   ID YEAR WAGES
# 1  1 2001   100
# 2  1 2002    98
# 3  1 2003    60
# 4  2 2002   120
# 5  2 2003    80
# 7  4 2010    50
# 8  4 2011    40
# 9  4 2012    30

Note - and , in df <- df[-c(max_row), ] . 注意-,df <- df[-c(max_row), ]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM