[英]How to find detect duplicates of single values in all rows and columns in R data.frame
I have a large data-set consisting of a header and a series of values in that column.我有一个包含标题和该列中的一系列值的大型数据集。 I want to detect the presence and number of duplicates of these values within the whole dataset.
我想检测整个数据集中这些值的存在和重复的数量。
1 2 3 4 5 6 7
734 456 346 545 874 734 455
734 783 482 545 456 948 483
So for example, it would detect 734 3 times, 456 twice etc.例如,它会检测 734 3 次,456 两次等。
I've tried using the duplicated function in r but this seems to only work on rows as a whole or columns as a whole.我试过在 r 中使用重复的函数,但这似乎只适用于整个行或整个列。 Using
使用
duplicated(df)
doesn't pick up any duplicates, though I know there are two duplicates in the first row.没有选择任何重复项,尽管我知道第一行中有两个重复项。
So I'm asking how to detect duplicates both within and between columns/rows.所以我问如何检测列/行内和列/行之间的重复项。
Cheers干杯
You can use table()
and data.frame()
to see the occurrence您可以使用
table()
和data.frame()
来查看发生情况
data.frame(table(v))
such that以至于
v Freq
1 1 1
2 2 1
3 3 1
4 4 1
5 5 1
6 6 1
7 7 1
8 346 1
9 455 1
10 456 2
11 482 1
12 483 1
13 545 2
14 734 3
15 783 1
16 874 1
17 948 1
DATA数据
v <- c(1, 2, 3, 4, 5, 6, 7, 734, 456, 346, 545, 874, 734, 455, 734,
783, 482, 545, 456, 948, 483)
You can transform it to a vector and then use table()
as follows:您可以将其转换为向量,然后使用
table()
如下:
library(data.table)
library(dplyr)
df<-fread("734 456 346 545 874 734 455
734 783 482 545 456 948 483")
df%>%unlist()%>%table()
# 346 455 456 482 483 545 734 783 874 948
# 1 1 2 1 1 2 3 1 1 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.