如果特定列在 r 中有值，如何删除重复项

Question

I need to delete some rows in my dataset based on the given condition.我需要根据给定的条件删除数据集中的一些行。 Kindly gothrough the sample data for reference.请浏览样本数据以供参考。

ID  Date       Dur
123 01/05/2000 3
123 08/04/2002 6
564 04/04/2012 2
741 01/08/2011 5
789 02/03/2009 1
789 08/01/2010 NA
789 05/05/2011 NA
852 06/06/2015 3
852 03/02/2016 NA
155 03/02/2008 NA
155 01/01/2009 NA
159 07/07/2008 NA

My main concern is Dur column.我主要关心的是 Dur 列。 I have to delete the rows which have Dur != NA for group ID's ie ID's(123,789,852) have more than one record/row with Dur value.我必须删除组 ID 为 Dur != NA 的行，即 ID(123,789,852) 具有多个具有 Dur 值的记录/行。 so I need to remove the ID with Dur value, which means entire ID of 123 and first record of 789 and 852. I don't want to delete any ID's(564,741,852) have Dur with single record or any other ID's with null in Dur.所以我需要删除带有 Dur 值的 ID，这意味着 123 的整个 ID 和 789 和 852 的第一条记录。我不想删除任何 ID（564,741,852）具有单条记录的 Dur 或任何其他 ID 在 Dur 中为 null .

Expected Output:预期输出：

ID  Date       Dur
564 04/04/2012 2
741 01/08/2011 5
789 08/01/2010 NA
789 05/05/2011 NA
852 03/02/2016 NA
155 03/02/2008 NA
155 01/01/2009 NA
159 07/07/2008 NA

Kindly suggest a code to solve the issue.请建议一个代码来解决这个问题。 Thanks in Advance!提前致谢！

Answer 1

One way would be to select rows where number of rows in the group is 1 or there are NA 's rows in the data.一种方法是选择组中行数为 1 或数据中有NA行的行。

This can be written in dplyr as :这可以用dplyr写成：

library(dplyr)
df %>% group_by(ID) %>% filter(n() == 1 | is.na(Dur))

#    ID Date         Dur
#  <int> <chr>      <int>
#1   564 04/04/2012     2
#2   741 01/08/2011     5
#3   789 08/01/2010    NA
#4   789 05/05/2011    NA
#5   852 03/02/2016    NA
#6   155 03/02/2008    NA
#7   155 01/01/2009    NA
#8   159 07/07/2008    NA

Using data.table :使用data.table ：

library(data.table)
setDT(df)[, .SD[.N == 1 | is.na(Dur)], ID]

and base R :和基础 R ：

subset(df, ave(is.na(Dur), ID, FUN = function(x) length(x) == 1 | x))

data数据

df <- structure(list(ID = c(123L, 123L, 564L, 741L, 789L, 789L, 789L, 
852L, 852L, 155L, 155L, 159L), Date = c("01/05/2000", "08/04/2002", 
"04/04/2012", "01/08/2011", "02/03/2009", "08/01/2010", "05/05/2011", 
"06/06/2015", "03/02/2016", "03/02/2008", "01/01/2009", "07/07/2008"
), Dur = c(3L, 6L, 2L, 5L, 1L, NA, NA, 3L, NA, NA, NA, NA)), 
class = "data.frame", row.names = c(NA, -12L))

Answer 2

We can use .I in data.table我们可以在data.table使用.I

library(data.table)
setDT(df1)[df1[, .I[.N == 1| is.na(Dur)], ID]$V1]

如果特定列在 r 中有值，如何删除重复项

问题描述

2 个解决方案

解决方案1
3 已采纳 2020-09-08 07:39:41

解决方案2
-1 2020-09-08 22:43:21

如果特定列在 r 中有值，如何删除重复项

问题描述

2 个解决方案

解决方案1 3 已采纳 2020-09-08 07:39:41

解决方案2 -1 2020-09-08 22:43:21

解决方案1
3 已采纳 2020-09-08 07:39:41

解决方案2
-1 2020-09-08 22:43:21