简体   繁体   English

在数据框中使用 lapply/apply 查找子集

[英]find a subset using lapply/apply in a data frame

I have two data frames cell and support (shown below).我有两个数据框单元格支持(如下所示)。 I want a subset of 'cell' using the following condition: all the rows of 'support' that match with vector b element-wise.我想要使​​用以下条件的“单元格”子集:与向量b元素匹配的“支持”的所有行。 my output will consist of all those rows of 'cell'.我的输出将包含所有这些“单元格”行。

a<-c(0,1,0)
b<-c(0,0,1)
level = c(3,2,4)
zero = c(1,2,1)


cell <- do.call(expand.grid, lapply(level, seq))    #create all cell
support <- t(apply(cell, 1, function(x) +(x != zero)))


> cell
   Var1 Var2 Var3
1     1    1    1
2     2    1    1
3     3    1    1
4     1    2    1
5     2    2    1
6     3    2    1
7     1    1    2
8     2    1    2
9     3    1    2
10    1    2    2
11    2    2    2
12    3    2    2
13    1    1    3
14    2    1    3
15    3    1    3
16    1    2    3
17    2    2    3
18    3    2    3
19    1    1    4
20    2    1    4
21    3    1    4
22    1    2    4
23    2    2    4
24    3    2    4
> support
      Var1 Var2 Var3
 [1,]    0    1    0
 [2,]    1    1    0
 [3,]    1    1    0
 [4,]    0    0    0
 [5,]    1    0    0
 [6,]    1    0    0
 [7,]    0    1    1
 [8,]    1    1    1
 [9,]    1    1    1
[10,]    0    0    1
[11,]    1    0    1
[12,]    1    0    1
[13,]    0    1    1
[14,]    1    1    1
[15,]    1    1    1
[16,]    0    0    1
[17,]    1    0    1
[18,]    1    0    1
[19,]    0    1    1
[20,]    1    1    1
[21,]    1    1    1
[22,]    0    0    1
[23,]    1    0    1
[24,]    1    0    1
> hD<-lapply(1:nrow(cell), function (x) cell[which(sum(support[x,]==b)==3),])
> do.call(rbind, hD)
  Var1 Var2 Var3
1    1    1    1
2    1    1    1
3    1    1    1

I tried to use lapply but I am not getting the expected output.我尝试使用 lapply 但我没有得到预期的输出。 My output should be row 10,16, and 22 of the cell (shown below) as rows 10,16, and 22 of support match exactly with vector b .我的输出应该是单元格的第 10,16 和 22 行(如下所示),因为支持的第 10,16 和 22 行与向量b完全匹配。 I do not want to use any loop.我不想使用任何循环。

  Var1 Var2 Var3
1    1    2    2
2    1    2    3
3    1    2    4

Here is another base R option这是另一个基本的 R 选项

subset(cell,Reduce(`&`,as.data.frame(t(t(support)==b))))

or或者

subset(cell,Reduce(`&`,as.data.frame(support == t(replicate(nrow(support),b)))))

which gives这使

   Var1 Var2 Var3
10    1    2    2
16    1    2    3
22    1    2    4

We can also do我们也可以做

cell[!rowSums(support != b[col(support)]),]
#   Var1 Var2 Var3
#10    1    2    2
#16    1    2    3
#22    1    2    4

You could compare b values to each row in support by transposing it and select the rows in cell where all values match.您可以通过转置b值与support的每一行进行比较,然后选择cell中所有值匹配的行。

cell[colSums(t(support) == b) == length(b), ]

#   Var1 Var2 Var3
#10    1    2    2
#16    1    2    3
#22    1    2    4

This can also be done using sweep :这也可以使用sweep来完成:

cell[rowSums(sweep(support, 2, b, `==`)) == length(b), ]

To compare with both a and b we can match them individually :为了与ab进行比较,我们可以单独匹配它们:

cell[colSums(t(support) == b) == length(b) | 
     colSums(t(support) == a) == length(a), ]

Or use lapply :或使用lapply

cell[Reduce(`|`, lapply(list(a, b), function(x) 
           colSums(t(support) == x) == length(x))), ]

Here an apply logic, which is applicable to both requirements.这里是一个apply逻辑,它适用于这两个要求。

cell[apply(support, 1, function(x) all(x == b)), ]
#    Var1 Var2 Var3
# 10    1    2    2
# 16    1    2    3
# 22    1    2    4

cell[apply(support, 1, function(x) all(x == b) | all(x == a)), ]
#    Var1 Var2 Var3
# 1     1    1    1
# 10    1    2    2
# 16    1    2    3
# 22    1    2    4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM