简体   繁体   English

R:根据条件,使行保持重复(几列)

[英]R: Keep row from duplicates (several columns) based on condition

Basically I want to: If rows are duplicated on the combination of some specific columns, then keep only the row that has the lowest value on another column. 基本上我想:如果行在某些特定列的组合上重复,则仅保留在另一列上具有最低值的行。

Example data (there's a lot more variance in my real data): 示例数据(我的真实数据有更多差异):

ID  BilagNr Henstand    Aftale  Belob   RP  Pos Dps Udlign  rykkedage
1   111     01-01-2017   1111   100     YA   1   1               10
1   122     02-01-2017   1222   100     YA   1        1          40
1   111     01-07-2017   1111   100     YA   1   1              100
2   222     01-01-2017   2121   299     YA   1        4          5
2   222     01-01-2017   2121   299     YA   1        4          98
2   212     01-05-2017   7654   299     BS   1                   3
3   333     01-08-2017   7654   345     BS   2                   45
4   444     01-01-2017   7654   345     BS   3   1    4          68
4   411     09-01-2017   7654   345     BS   1        4          43
5   555     01-01-2017   5555   700     BS   1                   13
5   555     01-01-2017   5555   700     BS   1                   67
6   666     01-01-2017   4720   100     BS   1                   23
6   666     03-01-2017   1234   100     BS   2        1          23
6   666     07-08-2017   1234   120     BS   3   1    1          23
7   777     01-01-2017   1234   90      BS   1        1          23
7   777     01-01-2017   1234   90      BS   1        1         199

So I want to only keep these: 所以我只想保留这些:

ID  BilagNr Henstand    Aftale  Belob   RP  Pos Dps Udlign  rykkedage
1   111     01-01-2017   1111   100     YA   1   1               10
1   122     02-01-2017   1222   100     YA   1        1          40
2   222     01-01-2017   2121   299     YA   1        4          5
2   212     01-05-2017   7654   299     BS   1                   3
3   333     01-08-2017   7654   345     BS   2                   45
4   444     01-01-2017   7654   345     BS   3   1    4          68
4   411     09-01-2017   7654   345     BS   1        4          43
5   555     01-01-2017   5555   700     BS   1                   13
6   666     01-01-2017   4720   100     BS   1                   23
6   666     03-01-2017   1234   100     BS   2        1          23
6   666     07-08-2017   1234   120     BS   3   1    1          23
7   777     01-01-2017   1234   90      BS   1        1          23

In other words: 换一种说法:

If the rows are duplicated in a combination of the columns ID, BilagNr, Henstand, Aftale, Belob, RP, Pos, Dps, Udlign then keep only one of the duplicated rows and choose this from the condition that rykkedage has to be the smallest of the duplicated rows. 如果行中的列ID,BilagNr,Henstand,Aftale,Belob,RP,波什,DPS的组合复制 ,Udlign 那么只保留重复行之一,并从条件选择这是rykkedage必须是最小的重复的行。

I hope it makes sense. 我希望这是有道理的。

也许这张照片会有所帮助

Furthermore, is it possible to add a code that keeps those duplicated rows that has the same value in rykkedage? 此外,是否可以添加代码,以在rykkedage中保留具有相同值的重复行? I have a large dataset, and I'm not sure if this is even a problem. 我有一个很大的数据集,我不确定这是否是一个问题。

Thank you! 谢谢!

We can group by 'ID', 'BilagNr', ..., 'Udlign', and then slice the rows with the index of the minimum value in 'rykkedage' 我们可以按“ ID”,“ BilagNr”,...,“ Udlign”进行分组,然后在“ rykkedage”中使用最小值的索引对行进行slice

library(dplyr)
df1 %>%
   group_by(ID, BilagNr, Henstand, Aftale, Belob, RP, Pos, Dps, Udlign) %>%
   slice(which.min(rykkedage))
# A tibble: 13 x 10
# Groups:   ID, BilagNr, Henstand, Aftale, Belob, RP, Pos, Dps, Udlign [13]
#      ID BilagNr Henstand   Aftale Belob RP      Pos   Dps Udlign rykkedage
#   <int>   <int> <chr>       <int> <int> <chr> <int> <int>  <int>     <int>
# 1     1     111 01-01-2017   1111   100 YA        1     1     NA        10
# 2     1     111 01-07-2017   1111   100 YA        1     1     NA       100
# 3     1     122 02-01-2017   1222   100 YA        1    NA      1        40
# 4     2     212 01-05-2017   7654   299 BS        1    NA     NA         3
# 5     2     222 01-01-2017   2121   299 YA        1    NA      4         5
# 6     3     333 01-08-2017   7654   345 BS        2    NA     NA        45
# 7     4     411 09-01-2017   7654   345 BS        1    NA      4        43
# 8     4     444 01-01-2017   7654   345 BS        3     1      4        68
# 9     5     555 01-01-2017   5555   700 BS        1    NA     NA        13
#10     6     666 01-01-2017   4720   100 BS        1    NA     NA        23
#11     6     666 03-01-2017   1234   100 BS        2    NA      1        23
#12     6     666 07-08-2017   1234   120 BS        3     1      1        23
#13     7     777 01-01-2017   1234    90 BS        1    NA      1        23

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM