[英]R programming : How to remove Duplicates in a column based on values of another column
A B
15 O
20 O
12 C
15 C
50 C
25 O
50 O
19 O
50 M
I have a data of the above format. 我有以上格式的数据。 I want to select unique rows based on unique elements in column A But incase there are duplicates then I need to refer to column B and select the one which has code 'C'
我想根据A列中的唯一元素选择唯一行,但如果有重复,则需要引用B列并选择代码为“ C”的行
Expected Output: 预期产量:
A B
20 O
12 C
15 C
50 C
25 O
19 O
Can anyone help.. 谁能帮忙..
We can use data.table
. 我们可以使用
data.table
。 Convert the 'data.frame' to 'data.table' ( setDT(df1)
), grouped by 'A', order
based on the logical condition ( B==O
), and get the first row with head
将'data.frame'转换为'data.table'(
setDT(df1)
),按'A'分组,根据逻辑条件( B==O
)进行order
,并获得带有head
的第一行
library(data.table)
setDT(df1)[order(B=="O"), head(.SD, 1), A]
# A B
#1: 12 C
#2: 15 C
#3: 50 C
#4: 20 O
#5: 25 O
#6: 19 O
Or this can be done with base R
by order
ing and get the unique
elements with duplicated
或者可以通过
order
base R
并order
base R
为base R
完成操作,并获得duplicated
的unique
元素
df2 <- df1[order(df1$A, df1$B=="O"),]
df2[!duplicated(df2$A),]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.