简体   繁体   English

用r识别并保留重复项

[英]identify and keep duplicates with r

Identify and keep only rows with duplicate elements in r 确定并仅保留r中具有重复元素的行

I have a large df with 20 plus columns and I need to identify and keep rows with duplicate elements from specified columns. 我有一个带有20多个列的大型df,我需要识别并保留指定列中具有重复元素的行。 My approach was going to be to create two new columns. 我的方法是创建两个新列。 The first column would be of concatenated elements. 第一列将是串联的元素。 The second column would be a binary telling me if data in the first column is duplicated. 第二列是一个二进制,告诉我第一列中的数据是否重复。 My df looks like this: 我的df看起来像这样:

在此处输入图片说明

For the first column I tried: 对于第一列,我尝试过:

res1 <-mutate(Prac_df, Con_cat =apply(Prac_df[order(PIn, Age, Sex),], 1, function(x) paste0(x, collapse = "_")))

I don't think that worked and I'm not sure how to create the second column which I will need to run a logistic regression. 我认为这行不通,我不确定如何创建第二列,我将需要运行逻辑回归。

And after my two columns are added it would look like this: 在添加了两列之后,它看起来像这样: 在此处输入图片说明

try this: 尝试这个:

library(dplyr)

res1 <- Prac_df %>%  
  group_by(PIN, Age, Sex) %>% 
  mutate(isDuplicated = row_number() > 1) %>% 
  ungroup()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM