[英]R: Keep row from duplicates (several columns) based on condition
Basically I want to: If rows are duplicated on the combination of some specific columns, then keep only the row that has the lowest value on another column. 基本上我想:如果行在某些特定列的组合上重复,则仅保留在另一列上具有最低值的行。
Example data (there's a lot more variance in my real data): 示例数据(我的真实数据有更多差异):
ID BilagNr Henstand Aftale Belob RP Pos Dps Udlign rykkedage
1 111 01-01-2017 1111 100 YA 1 1 10
1 122 02-01-2017 1222 100 YA 1 1 40
1 111 01-07-2017 1111 100 YA 1 1 100
2 222 01-01-2017 2121 299 YA 1 4 5
2 222 01-01-2017 2121 299 YA 1 4 98
2 212 01-05-2017 7654 299 BS 1 3
3 333 01-08-2017 7654 345 BS 2 45
4 444 01-01-2017 7654 345 BS 3 1 4 68
4 411 09-01-2017 7654 345 BS 1 4 43
5 555 01-01-2017 5555 700 BS 1 13
5 555 01-01-2017 5555 700 BS 1 67
6 666 01-01-2017 4720 100 BS 1 23
6 666 03-01-2017 1234 100 BS 2 1 23
6 666 07-08-2017 1234 120 BS 3 1 1 23
7 777 01-01-2017 1234 90 BS 1 1 23
7 777 01-01-2017 1234 90 BS 1 1 199
So I want to only keep these: 所以我只想保留这些:
ID BilagNr Henstand Aftale Belob RP Pos Dps Udlign rykkedage
1 111 01-01-2017 1111 100 YA 1 1 10
1 122 02-01-2017 1222 100 YA 1 1 40
2 222 01-01-2017 2121 299 YA 1 4 5
2 212 01-05-2017 7654 299 BS 1 3
3 333 01-08-2017 7654 345 BS 2 45
4 444 01-01-2017 7654 345 BS 3 1 4 68
4 411 09-01-2017 7654 345 BS 1 4 43
5 555 01-01-2017 5555 700 BS 1 13
6 666 01-01-2017 4720 100 BS 1 23
6 666 03-01-2017 1234 100 BS 2 1 23
6 666 07-08-2017 1234 120 BS 3 1 1 23
7 777 01-01-2017 1234 90 BS 1 1 23
In other words: 换一种说法:
If the rows are duplicated in a combination of the columns ID, BilagNr, Henstand, Aftale, Belob, RP, Pos, Dps, Udlign then keep only one of the duplicated rows and choose this from the condition that rykkedage has to be the smallest of the duplicated rows. 如果行中的列ID,BilagNr,Henstand,Aftale,Belob,RP,波什,DPS的组合复制 ,Udlign 那么只保留重复行之一,并从条件选择这是rykkedage必须是最小的重复的行。
I hope it makes sense. 我希望这是有道理的。
Furthermore, is it possible to add a code that keeps those duplicated rows that has the same value in rykkedage? 此外,是否可以添加代码,以在rykkedage中保留具有相同值的重复行? I have a large dataset, and I'm not sure if this is even a problem. 我有一个很大的数据集,我不确定这是否是一个问题。
Thank you! 谢谢!
We can group by 'ID', 'BilagNr', ..., 'Udlign', and then slice
the rows with the index of the minimum value in 'rykkedage' 我们可以按“ ID”,“ BilagNr”,...,“ Udlign”进行分组,然后在“ rykkedage”中使用最小值的索引对行进行slice
library(dplyr)
df1 %>%
group_by(ID, BilagNr, Henstand, Aftale, Belob, RP, Pos, Dps, Udlign) %>%
slice(which.min(rykkedage))
# A tibble: 13 x 10
# Groups: ID, BilagNr, Henstand, Aftale, Belob, RP, Pos, Dps, Udlign [13]
# ID BilagNr Henstand Aftale Belob RP Pos Dps Udlign rykkedage
# <int> <int> <chr> <int> <int> <chr> <int> <int> <int> <int>
# 1 1 111 01-01-2017 1111 100 YA 1 1 NA 10
# 2 1 111 01-07-2017 1111 100 YA 1 1 NA 100
# 3 1 122 02-01-2017 1222 100 YA 1 NA 1 40
# 4 2 212 01-05-2017 7654 299 BS 1 NA NA 3
# 5 2 222 01-01-2017 2121 299 YA 1 NA 4 5
# 6 3 333 01-08-2017 7654 345 BS 2 NA NA 45
# 7 4 411 09-01-2017 7654 345 BS 1 NA 4 43
# 8 4 444 01-01-2017 7654 345 BS 3 1 4 68
# 9 5 555 01-01-2017 5555 700 BS 1 NA NA 13
#10 6 666 01-01-2017 4720 100 BS 1 NA NA 23
#11 6 666 03-01-2017 1234 100 BS 2 NA 1 23
#12 6 666 07-08-2017 1234 120 BS 3 1 1 23
#13 7 777 01-01-2017 1234 90 BS 1 NA 1 23
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.