[英]Trying to keep values of a column based on the unique values of two other columns
I want to keep only the 2 largest values in a column of a df according to the unique pair of values in two other columns. 我想根据另外两列中的唯一值对将df列中的2个最大值保留下来。 eg, I have this df:
例如,我有这个df:
df <- data.frame('ID' = c(1,1,1,2,2,3,4,4,4,5),
'YEAR' = c(2002,2002,2003,2002,2003,2005,2010,2011,2012,2008),
'WAGES' = c(100,98,60,120,80,300,50,40,30,500));
And I want to drop the 3rd and 9th rows, equivalently, keep the first two largest values in WAGES column. 我想等效地删除第三行和第9行,将前两个最大值保留在WAGES列中。 The df has roughly 300,000 rows.
df大约有300,000行。
You can use dplyr
's top_n
: 您可以使用
dplyr
的top_n
:
library(dplyr)
df %>%
group_by(ID) %>%
top_n(n = 2, wt = WAGES)
## A tibble: 8 x 3
## Groups: ID [5]
# ID YEAR WAGES
# <dbl> <dbl> <dbl>
#1 1 2001 100
#2 1 2002 98
#3 2 2002 120
#4 2 2003 80
#5 3 2005 300
#6 4 2010 50
#7 4 2011 40
#8 5 2008 500
If I understood your question correctly, using base R: 如果我正确理解了您的问题,请使用基数R:
for (i in 1:2) {
max_row <- which.max(df$WAGES)
df <- df[-c(max_row), ]
}
df
# ID YEAR WAGES
# 1 1 2001 100
# 2 1 2002 98
# 3 1 2003 60
# 4 2 2002 120
# 5 2 2003 80
# 7 4 2010 50
# 8 4 2011 40
# 9 4 2012 30
Note -
and ,
in df <- df[-c(max_row), ]
. 注意
-
和,
在df <- df[-c(max_row), ]
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.