简体   繁体   English

R:删除行和列中的重复值

[英]R: remove duplicated values in across rows and columns

I've found many pages about finding duplicated elements in a list or duplicated rows in a data frame. 我发现了许多关于在列表中查找重复元素或在数据框中查找重复行的页面。 However, I want to search for duplicated elements throughout the entire data frame. 但是,我想在整个数据框中搜索重复的元素。 Take this as an example: 以此为例:

df
     coupon1    coupon2    coupon3
1         10         11         12
2         13         16         15
3         16         17         18
4         19         20         21
5         22         23         24
6         25         26         27

You'll notice that df[2,2] and df[3,1] have the same element (16). 您会注意到df [2,2]和df [3,1]具有相同的元素(16)。 When I run 我跑的时候

duplicated(df)

It returns six "FALSE"s because the entire row isn't duplicated, just one element. 它返回六个“FALSE”,因为整行不重复,只有一个元素。 How can I check for any duplicated values within the entire data frame? 如何检查整个数据框中的任何重复值? I would like to both know the duplicate exist and also know its value (and the same if there's multiple duplicates). 我想知道重复存在并且也知道它的值(如果有多个重复,则相同)。

This will find global dupes but it searches columnwise. 这将找到全局欺骗,但它会按列搜索。 So (3,1) will still be FALSE as it is the first value 16 in the data frame. 所以(3,1)仍然是FALSE,因为它是数据帧中的第一个值16

m <- matrix(duplicated(unlist(df)), ncol=ncol(df))
#      [,1]  [,2]  [,3]
#[1,] FALSE FALSE FALSE
#[2,] FALSE  TRUE FALSE
#[3,] FALSE FALSE FALSE
#[4,] FALSE FALSE FALSE
#[5,] FALSE FALSE FALSE
#[6,] FALSE FALSE FALSE

You can then use it however you'd like, for example: 然后,您可以根据需要使用它,例如:

df[m]
#[1] 16
which(duplicated(stack(yourdf)[,1]))
[1] 8
stack(yourdf)[,1][which(duplicated(stack(yourdf)[,1]))]
[1] 16

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM