R：使用多列条件删除行并替换值

Question

I want to filter out all values of var3 < 5 while keeping at least one occurrence of each value of var1. 我想过滤掉var3 <5的所有值，同时保持var1的每个值至少出现一次。

> foo <- data.frame(var1=c(1, 1, 8, 8, 5, 5, 5), var2=c(1,2,3,2,4,6,8), var3=c(7,1,1,1,1,1,6))
> foo
  var1 var2 var3
1    1    1    7
2    1    2    1
3    8    3    1
4    8    2    1
5    5    4    1
6    5    6    1
7    5    8    6

subset(foo, (foo$var3>=5)) would remove row 2 to 6 and I would have lost var1==8. subset(foo, (foo$var3>=5))会删除第2行到第6行，我会丢失var1 == 8。

I want to remove the row if there is another value of var1 that fulfills the condition foo$var3 >= 5. See row 5. 如果有另一个var1值满足条件foo $ var3> = 5，我想删除该行。请参阅第5行。
I want to keep the row, assiging NA to var2 and var3 if all occurrences of a value var1 do not fulfill the condition foo$var3 >= 5. 我想保留行，如果值var1的所有出现都不满足条件foo $ var3> = 5，则将NA分配给var2和var3。

This is the result I expect: 这是我期望的结果：

  var1 var2 var3
1    1    1    7
3    8   NA   NA
7    5    8    6

This is the closest I got: 这是我得到的最接近的：

> foo$var3[ foo$var3 < 5 ] = NA
> foo$var2[ is.na(foo$var3) ] = NA
> foo
  var1 var2 var3
1    1    1    7
2    1   NA   NA
3    8   NA   NA
4    8   NA   NA
5    5   NA   NA
6    5   NA   NA
7    5    8    6

Now I just need to know how to conditionally remove the right rows (2, 3 or 4, 5, 6): Remove the row if var2 & var3 are NA and if the value of var1 has more than 1 occurrence. 现在我只需要知道如何有条件地删除右行（2,3 或 4,5,6）：如果var2和var3是NA并且var1的值多于1次，则删除行。

But there is surely a much simpler/elegant way to approach this little problem. 但肯定有一种更简单/更优雅的方式来解决这个小问题。

edit: changed foo to resemble my use case more 编辑：改变了foo以更像我的用例

Answer 1

The fastest way is to use merge: 最快的方法是使用merge：

> merge(foo[foo$var3>5,],unique(foo$var1),by.x=1,by.y=1,all.y=T)
  var1 var2 var3
1    1    1    7
2    5    8    6
3    8   NA   NA

unique(foo$var1) gives the unique values in var1. unique(foo$var1)给出unique(foo$var1)的唯一值。 These ones are mapped against the dataframe where var3 is larger than five. 这些映射针对var3大于5的数据帧。 You take the first column of every argument (all.x=1, all.y=1) and you say that all values in y should be represented (all.y=T). 你得到每个参数的第一列（all.x = 1，all.y = 1），你说y中的所有值都应该被表示（all.y = T）。 See also ?merge . 另见?merge 。

If you want to preserve the order, then : 如果您想保留订单，那么：

> merge(foo[foo$var3>5,],unique(foo$var1),by.x=1,by.y=1,
+ all.y=T)[order(unique(foo$var1)),]
  var1 var2 var3
1    1    1    7
3    8   NA   NA
2    5    8    6

merge sorts the variable on which the mapping happens. merge对发生映射的变量进行排序。 order gives this sorting, so you can reverse it using that order as indices. order给出了这个排序，所以你可以使用该顺序作为索引来反转它。 See also ?order . 另见?order 。

Answer 2

After you do: 你这样做之后：

foo$var3[ foo$var3 < 5 ] = NA
foo$var2[ is.na(foo$var3) ] = NA

You need to remove rows containing NA that are also duplicate values of var1: 您需要删除包含NA的行，这些行也是var1的重复值：

foo[!(!complete.cases(foo) & duplicated(foo$var1)), ]

Think of this line as identifying lines that contain NA values AND duplicate var1 values, then selecting everything else. 可以将此行视为标识包含NA值和重复var1值的行，然后选择其他所有内容。

Edit: If the first row in a dataframe for a given value of var1 has a value of var3 that you want to exclude, my solution doesn't work. 编辑：如果给定值var1的数据框中的第一行具有您要排除的值var3，则我的解决方案不起作用。 You'll need to order the data.frame first to make sure that the complete cases come first: 您需要首先订购data.frame以确保完整的案例首先出现：

foo <- foo[order(foo$var2),]   # ordering on var3 should be the same
foo[!(!complete.cases(foo) & duplicated(foo$var1)), ]

Answer 3

rbind(r <- subset(foo, (foo$var3>=5)), 
      unique(transform(subset(foo, !var1%in%r$var1), var2=NA, var3=NA)))

step-by-step: 一步步：

r <- subset(foo, (foo$var3>=5))

r2 <- subset(foo, !var1%in%r$var1) # extract var1 != r$var1
r3 <- transform(r2, var2=NA, var3=NA) # replace var2 and var3 with NA
r4 <- unique(r3) # remove duplicates

rbind(r, r4) # bind them

Answer 4

Here's a way using the plyr package functions ddply and colwise , and the subset function. 这是使用plyr包函数ddply和colwise以及subset函数的一种方法。 First define a helper function null2na : 首先定义一个辅助函数null2na ：

null2na <- function(x) if ( length(x) == 0 ) NA else x

Next define the function filter that we want to apply to each sub-data-frame that has a specific value for var1 : 接下来定义我们要应用于具有var1特定值的每个子数据帧的函数filter ：

filter <- function(df) cbind( data.frame( var1 = df[1,1]),
                              colwise(null2na) (subset(df, var3 >= 5)[,-1]))

Now do the ddply on foo by var1 : 现在通过var1在foo上执行ddply ：

> ddply(foo, .(var1), filter)
  var1 var2 var3
1    1    1    7
2    5    8    6
3    8   NA   NA

Answer 5

Try this: 尝试这个：

foo <- data.frame(var1= c(1, 1, 2, 3, 3, 4, 4, 5), 
     var2=c(9, 5, 13, 9, 12, 11, 13, 9), 
     var3=c(6, 8, 3, 6, 4, 7, 2, 9))
f2=foo[which(foo$var3>5),]

missing = which(!(foo$var1 %in% f2$var1))
f3 = rbind(f2, list(foo$var1[missing], rep(NA, length(missing)),rep(NA,length(missing))))
f3[order(f3$var1),]

The last row is only needed if you care about the order (assuming that the data is ordered on var1 in the first place=. 只有在关心订单时才需要最后一行（假设数据在第一个地方的var1上排序=。

R：使用多列条件删除行并替换值

问题描述

5 个解决方案

解决方案1
10 已采纳 2011-01-16 12:31:41

解决方案2
3 2011-01-15 21:36:19

解决方案3
2 2011-01-16 03:26:45

解决方案4
1 2011-01-16 03:42:27

解决方案5
0 2011-01-15 21:13:52

R：使用多列条件删除行并替换值

问题描述

5 个解决方案

解决方案1 10 已采纳 2011-01-16 12:31:41

解决方案2 3 2011-01-15 21:36:19

解决方案3 2 2011-01-16 03:26:45

解决方案4 1 2011-01-16 03:42:27

解决方案5 0 2011-01-15 21:13:52

解决方案1
10 已采纳 2011-01-16 12:31:41

解决方案2
3 2011-01-15 21:36:19

解决方案3
2 2011-01-16 03:26:45

解决方案4
1 2011-01-16 03:42:27

解决方案5
0 2011-01-15 21:13:52