在lapply中嵌套的for循環中無法識別的變量

Question

我有以下數據

set.seed(42)
dat <- list(data.table(id=1:10, group=rep(1:2, each=5), x=rnorm(10)), 
            data.table(id=1:10, group=rep(1:2, each=5), x=rnorm(10)))

我想將此功能逐個元素地應用，並逐個組地應用。

subs = function(x, ..., verbose=FALSE){
  L   = substitute(list(...))[-1]
  mon = data.table(cond = as.character(L))[, skip := FALSE]

  for (i in seq_along(L)){
    d = eval( substitute(x[cond, verbose=v], list(cond = L[[i]], v = verbose)) )
    if (nrow(d)){
      x = d
    } else {
      mon[i, skip := TRUE]
    }    
  }
  #print(mon)
  return(x)
}

但是，當我運行這段代碼時

# works
out <- lapply(1:2, function(h){
    res <- list()
    d <- dat[[h]] 
    for(k in 1:2){
        g <- d[group==k]
        cutoff <- 1
        print(cutoff)
        res[[k]] <- subs(g, x>cutoff)
    }
    res
})

我收到錯誤消息，盡管正確打印，但找不到對象cutoff 。 但是，當我在lapply()之外應用相同的for循環時，它似乎可以工作。

d1 <- dat[[1]]
s <- list()
for(k in 1:2){
    g <- d1[group==k]
    cutoff <- 1
    s[[k]] <- subs(g, x>cutoff)
}

> s
[[1]]
   id group        x
1:  1     1 1.370958

[[2]]
   id group        x
1:  7     2 1.511522
2:  9     2 2.018424

這使我懷疑是lapply()包含導致了錯誤，但是我發現很難看到錯誤是什么，以及如何解決它。

編輯

具有兩個變量的數據：

set.seed(42)
dat <- list(data.table(id=1:10, group=rep(1:2, each=5), x=rnorm(10), y=11:20), 
            data.table(id=1:10, group=rep(1:2, each=5), x=rnorm(10), y=11:20))

預期結果

[[1]]
   id group          x   y
1:  9     2  2.0184237  19
2:  1     1  1.3709584  11
3:  2     1 -0.5646982  12
4:  3     1  0.3631284  13
5:  4     1  0.6328626  14
6:  5     1  0.4042683  15

[[2]]
   id group          x   y
1:  2     1  2.2866454  12
2: 10     2  1.3201133  20

Answer 1

如果您使用非標准評估，則您始終會付出代價。 這是一個范圍界定問題。

它是這樣的：

subs = function(x, ..., verbose=FALSE){
  L   = substitute(list(...))[-1]
  mon = data.table(cond = as.character(L))[, skip := FALSE]

  for (i in seq_along(L)){
    d = eval( substitute(x[cond,, #needed to add this comma, don't know why
                           verbose=v], list(cond = L[[i]], v = verbose)))
    if (nrow(d)){
      x = d
    } else {
      mon[i, skip := TRUE]
    }    
  }
  #print(mon)
  return(x)
}

out <- lapply(1:2, function(h){
  res <- list()
  d <- dat[[h]] 
  for(k in 1:2){
    g <- d[group==k]

    cutoff <- 1
    res[[k]] <- eval(substitute(subs(g, x>cutoff), list(cutoff = cutoff)))
  }
  res
})
#works

是否有不使用data.table的by參數的特定原因？

編輯：

背景：subs（）的要點是應用多個條件（如果將多個條件傳遞給它），除非一個條件會導致一個空子集。

我會使用另一種方法：

subs = function(x, ..., verbose=FALSE){
  L   = substitute(list(...))[-1]

  for (i in seq_along(L)){
    d = eval( substitute(x[cond, , verbose=v], list(cond = L[[i]], v = verbose)))
    x <- rbind(d, x[!d, on = "group"]) 
  }

  return(x)
}

out <- lapply(dat, function(d){

  cutoff <- 2 #to get empty groups

  eval(substitute(subs(d, x>cutoff), list(cutoff = cutoff)))

})

#[[1]]
#   id group          x
#1:  9     2  2.0184237
#2:  1     1  1.3709584
#3:  2     1 -0.5646982
#4:  3     1  0.3631284
#5:  4     1  0.6328626
#6:  5     1  0.4042683
#
#[[2]]
#   id group          x
#1:  2     1  2.2866454
#2:  6     2  0.6359504
#3:  7     2 -0.2842529
#4:  8     2 -2.6564554
#5:  9     2 -2.4404669
#6: 10     2  1.3201133

請注意，這不會保留順序。

保留順序的另一個選項：

subs = function(x, ..., verbose=FALSE){
  L   = substitute(list(...))[-1]

  for (i in seq_along(L)){
    x = eval( substitute(x[, {
      res <- .SD[cond];
      if (nrow(res) > 0) res else .SD 
    }, by = "group", verbose=v], list(cond = L[[i]], v = verbose)))
  }

  return(x)
}

可以將by變量作為函數參數傳遞，然后與條件一起替換。

我尚未完成比較這兩個效率的基准測試。

在lapply中嵌套的for循環中無法識別的變量

問題描述

1 個解決方案

解決方案1
3 已采納 2019-08-21 13:13:58

在lapply中嵌套的for循環中無法識別的變量

問題描述

1 個解決方案

解決方案1 3 已采納 2019-08-21 13:13:58

解決方案1
3 已采納 2019-08-21 13:13:58