简体   繁体   English

如何避免在R中使用嵌套的lapply?

[英]how to avoid of using nested lapply in R?

I am seeking efficient alternative for nested lapply, I think using nested structure is not appreciated in R community. 我正在寻找嵌套lapply的有效替代方案,我认为使用嵌套结构在R社区中不受欢迎。 Can anyone propose possible ideas, or approach to avoid of using nest lapply in custom function? 任何人都可以提出可能的想法或方法来避免在自定义函数中使用嵌套嵌套吗?

Here is quick reproducible example: 这是可快速复制的示例:

simulated Data 模拟数据

a <- data.frame(
  start=seq(1, by=9, len=18), stop=seq(6, by=9, len=18),
  ID=letters[seq(1:18)], score=sample(1:25, 18, replace = FALSE))
b <- data.frame(
  start=seq(2, by=11, len=20), stop=seq(8, by=11, len=20),
  ID=letters[seq(1:20)], score=sample(1:25, 20, replace = FALSE))
c <- data.frame(
  start=seq(4, by=11, len=25), stop=seq(9, by=11, len=25),
  ID=letters[seq(1:25)], score=sample(1:25, 25, replace = FALSE))

function that I used nested lapply, but want to avoid this: 我使用的函数嵌套lapply,但要避免这种情况:

a.big <- a[a$score >10,]
a.sml <- a[(a$score > 6 & a$score <= 10),]
a.non <- a[a$score < 6,]

a_new <- list('big'=a.big, 'sml'=a.sml)
tar.list <- list(b,c)

test <- lapply(a_new, function(ele_) {
  re <- lapply(tar.list, function(li) {
    out <- base::setdiff(ele_, li)
    return(out)
  })
})

objective: 目的:

avoid of using nested lapply, to find its efficient alternative. 避免使用嵌套的lapply,以找到其有效的替代方案。 I mean to find better representation for its output which must be easy/fast to reproduce, and allow fast/easy downstream computation. 我的意思是为它的输出找到更好的表示形式,它必须易于/快速再现,并允许快速/便捷的下游计算。 Is there any general approach to do this? 有没有一般的方法可以做到这一点?

How to avoid of using nested lapply in test ? 如何避免在test中使用嵌套的lapply? Can anyone propose possible ideas to get through this issues ? 任何人都可以提出解决此问题的想法吗? Thanks 谢谢

Best regards: 最好的祝福:

Jeff 杰夫

I'm not sure what you really want. 我不确定你到底想要什么。 But if you like setdiff of all combinations of both lists, then you can use something like this: 但是,如果您喜欢这两个列表的所有组合的setdiff ,则可以使用如下所示的内容:

# all combinations
a <- expand.grid(seq_along(a_new), seq_along(tar.list))
a
  Var1 Var2
1    1    1
2    2    1
3    1    2
4    2    2
# apply over all combinations setdiff row-vice 
apply(a, 1, function(x, y, z){ setdiff(y[x[1]], z[x[2]])}, a_new, tar.list)[1:2]
[[1]]
[[1]][[1]]
   start stop ID score
2     10   15  b    21
3     19   24  c    12
6     46   51  f    23
9     73   78  i    15
10    82   87  j    19
11    91   96  k    25
13   109  114  m    11
16   136  141  p    17
17   145  150  q    18
18   154  159  r    24


[[2]]
[[2]][[1]]
   start stop ID score
5     37   42  e     9
14   118  123  n     8
15   127  132  o     7

Using double [[]] brakets gives you a cleaner output of only one list. 使用双[[]]制动踏板可以使输出只有一个列表,效果更清晰。

apply(a, 1, function(x, y, z){ setdiff(y[[x[1]]],z[[x[2]]])}, a_new, tar.list)

[[1]]
   start stop ID score
2     10   15  b    21
3     19   24  c    12
6     46   51  f    23
9     73   78  i    15
10    82   87  j    19
11    91   96  k    25
13   109  114  m    11
16   136  141  p    17
17   145  150  q    18
18   154  159  r    24

[[2]]
   start stop ID score
5     37   42  e     9
14   118  123  n     8
15   127  132  o     7

[[3]]
   start stop ID score
2     10   15  b    21
3     19   24  c    12
6     46   51  f    23
9     73   78  i    15
10    82   87  j    19
11    91   96  k    25
13   109  114  m    11
16   136  141  p    17
17   145  150  q    18
18   154  159  r    24

[[4]]
   start stop ID score
5     37   42  e     9
14   118  123  n     8
15   127  132  o     7

Is that what you want? 那是你要的吗?

outd <- function(ele_, li) base::setdiff(ele_, li)
mapply(outd, a_new, tar.list, SIMPLIFY = FALSE)

> mapply(outd, a_new, tar.list, SIMPLIFY = FALSE)
$big
   start stop ID score
1      1    6  a    12
6     46   51  f    20
8     64   69  h    24
9     73   78  i    13
10    82   87  j    11
12   100  105  l    19
14   118  123  n    16
15   127  132  o    18
16   136  141  p    22
17   145  150  q    23
18   154  159  r    14

$sml
  start stop ID score
2    10   15  b     9
7    55   60  g    10

Edit 编辑

In the previous case mapply applies the function to pairs of the lists elements. 在前一种情况下, mapply将函数应用于列表元素对。

If we take the ideia from outer to expand both lists, we get (not sure if will work in other cases): 如果我们将意识形态从outer扩展到两个列表,我们都会得到(不确定在其他情况下是否可以使用):

bY <- rep(tar.list, rep.int(length(a_new), length(tar.list)))
bX <- rep(a_new, times = ceiling(length(bY)/length(a_new)))
mapply(outd, bX, bY, SIMPLIFY = FALSE)

> mapply(outd, bX, bY, SIMPLIFY = FALSE)
$big
   start stop ID score
1      1    6  a    25
2     10   15  b    23
4     28   33  d    14
7     55   60  g    19
9     73   78  i    20
10    82   87  j    21
12   100  105  l    13
13   109  114  m    12
14   118  123  n    22
16   136  141  p    15
17   145  150  q    18

$sml
   start stop ID score
6     46   51  f     9
8     64   69  h     8
18   154  159  r    10

$big
   start stop ID score
1      1    6  a    25
2     10   15  b    23
4     28   33  d    14
7     55   60  g    19
9     73   78  i    20
10    82   87  j    21
12   100  105  l    13
13   109  114  m    12
14   118  123  n    22
16   136  141  p    15
17   145  150  q    18

$sml
   start stop ID score
6     46   51  f     9
8     64   69  h     8
18   154  159  r    10

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM