[英]Identify duplicates in a df in a particular column in R
For a sample dataframe: 对于示例数据框:
df <- structure(list(code = c("a1", "a1", "b2", "v4", "f5", "f5", "h7",
"a1"), name = c("katie", "katie", "sally", "tom", "amy", "amy",
"ash", "james"), number = c(3.5, 3.5, 2, 6, 4, 4, 7, 3)), .Names = c("code",
"name", "number"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-8L), spec = structure(list(cols = structure(list(code = structure(list(), class = c("collector_character",
"collector")), name = structure(list(), class = c("collector_character",
"collector")), number = structure(list(), class = c("collector_double",
"collector"))), .Names = c("code", "name", "number")), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
I want to produce a dataframe of rows that have duplicates in one specific column only. 我只想产生一个仅在一个特定列中具有重复项的行的数据框。
I know I can do: 我知道我可以做:
df[duplicated(df),]
But for my specific larger real dataframe, I want to only specify a particular column that I want to highlight duplicates in. 但是对于特定的较大的实际数据框,我只想指定要突出显示重复项的特定列。
Any ideas? 有任何想法吗?
duplicated() accepts vectors... 重复()接受向量...
df[duplicated(df$name), ]
code name number
2 a1 katie 3.5
6 f5 amy 4.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.