用r识别并保留重复项

Question

Identify and keep only rows with duplicate elements in r 确定并仅保留r中具有重复元素的行

I have a large df with 20 plus columns and I need to identify and keep rows with duplicate elements from specified columns. 我有一个带有20多个列的大型df，我需要识别并保留指定列中具有重复元素的行。 My approach was going to be to create two new columns. 我的方法是创建两个新列。 The first column would be of concatenated elements. 第一列将是串联的元素。 The second column would be a binary telling me if data in the first column is duplicated. 第二列是一个二进制，告诉我第一列中的数据是否重复。 My df looks like this: 我的df看起来像这样：

For the first column I tried: 对于第一列，我尝试过：

res1 <-mutate(Prac_df, Con_cat =apply(Prac_df[order(PIn, Age, Sex),], 1, function(x) paste0(x, collapse = "_")))

I don't think that worked and I'm not sure how to create the second column which I will need to run a logistic regression. 我认为这行不通，我不确定如何创建第二列，我将需要运行逻辑回归。

And after my two columns are added it would look like this: 在添加了两列之后，它看起来像这样：

Answer 1

try this: 尝试这个：

library(dplyr)

res1 <- Prac_df %>%  
  group_by(PIN, Age, Sex) %>% 
  mutate(isDuplicated = row_number() > 1) %>% 
  ungroup()

用r识别并保留重复项

问题描述

1 个解决方案

解决方案1
1 2019-07-11 21:02:09

用r识别并保留重复项

问题描述

1 个解决方案

解决方案1 1 2019-07-11 21:02:09

解决方案1
1 2019-07-11 21:02:09