如何在 R 中使用 lapply 替換嵌套循環？

Question

下午好，

我開發了這個 R 函數來散列存儲桶中的數據：

#   The used packages 
    library("pacman")
    pacman::p_load(dplyr, tidyr, devtools, MASS, pracma, mvtnorm, interval, intervals) 
    pacman::p_load(sprof, RDocumentation, helpRFunctions, foreach , philentropy , Rcpp , RcppAlgos) 


  hash<-function(v,p){
  if(dot(v,p)>0) return(1) else (0)   }

  LSH_Band<-function(data,K ){

  # We retrieve numerical columns of data 
  t<-list.df.var.types(data)
  df.r<-as.matrix(data[c(t$numeric,t$Intervals)])
  n=nrow(df.r)

  # we create K*K matrice using normal law
  rn=array(rnorm(K*K,0,1),c(K,K))
  # we create K*K matrice of integers using uniform law , integrs are unique in each column
  rd=unique.array(array(unique(ceiling(runif(K*K,0,ncol(df.r)))),c(K,K)))

  buckets<-array(NA,c(K,n)) 
    for (i in 1:K) {
      for (j in 1:n) {
        buckets[i,j]<-hash(df.r[j,][rd[,i]],rn[,i])
      }
    }   
  return(buckets)   
}
> df.r
  age height salaire.1 salaire.2
1  27    180         0      5000
2  26    178         0      5000
3  30    190      7000     10000
4  31    185      7000     10000
5  31    187      7000     10000
6  38    160     10000     15000
7  39    158     10000     15000
> LSH_Band(df.r, 3 )
     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    1    1    1    1    1    1    1
[2,]    1    1    0    0    0    0    0
[3,]    0    0    0    0    0    0    0

點函數是兩個向量的標量積。

我的 Lsh 函數需要一行我的 data ，然后它使用df.r[j,][rd[,i]]獲取獲取行的一部分。 df.r[j,]是數據的 j-éme 行。
rd[,i] : rd 是一個 K*K 矩陣，由 1 到 ncol(df.r) 之間的整數組成，矩陣的每一列只包含唯一的整數。
rn[,i] : rn 是一個 K*K 矩陣，包含 N(0,1) 定律的值。
在結果表中，觀察值以列表示。 我將有 k 行。 對於最后一行，我將計算df.r[j,][rd[,K]]和rn[,K]之間的標量積。 如果標量積為正，我將獲得 1。 rd[,K]和rn[,K]將僅用於結果表中的最后一行以及該行中的所有觀察。

我的問題：

是否用lapply 函數用變量 i 和 j 替換循環？

我的真實數據會很大，這就是我問這個問題的原因。

謝謝！

Answer 1

以下作為評論有點太長了，所以這里有一些提示/問題/評論：

首先，我不得不說我很難理解LHS_Band作用。 也許一些背景會在這里有所幫助。
我不明白某些函數的用途，比如helpRFunctions::list.df.var.type似乎只是返回list中data的列名。 另請注意， t$Intervals根據您提供的示例數據返回NULL 。 所以我不確定那里發生了什么。
我也沒有看到函數pracma::dot 。 可以使用%*%在基數 R 中計算兩個向量之間的點積。 真的不需要額外的包。
函數hash可以更緊湊地寫為
```
hash <- function(v, p) +(as.numeric(v %*% p) > 0)
```
這避免了緩慢的if條件。

盡管我不了解您要做什么，但這里對您的代碼進行了一些調整

hash <-  function(v, p) +(as.numeric(v %*% p) > 0)

LSH_Band <- function(data, K, seed = NULL) {

    # We retrieve numerical columns of data
    data <- as.matrix(data[sapply(data, is.numeric)])
    # we create K*K matrice using normal law
    if (!is.null(seed)) set.seed(seed)
    rn <- matrix(rnorm(K * K, 0, 1), nrow = K, ncol = K)
    # we create K*K matrice of integers using uniform law , integrs are unique in each column
    rd <- sapply(seq_len(K), function(col) sample.int(ncol(data), K))
    buckets <- matrix(NA, nrow = K, ncol = nrow(data))
    for (i in 1:K) {
        buckets[i, ] <- apply(data, 1, function(row) hash(row[rd[, i]], rn[, i]))
    }
    buckets
}

在處理隨機數時，始終添加一個選項以使用可重復的seed 。 這將使調試變得更加容易了很多。
您可以使用apply替換至少一個for循環（當使用MARGIN = 1會遍歷matrix （或array ）的行）。
我刪除了所有不必要的包依賴項，並用基本 R 函數替換了該功能。

如何在 R 中使用 lapply 替換嵌套循環？

問題描述

1 個解決方案

解決方案1
2 已采納 2020-02-20 22:25:08

如何在 R 中使用 lapply 替換嵌套循環？

問題描述

1 個解決方案

解決方案1 2 已采納 2020-02-20 22:25:08

解決方案1
2 已采納 2020-02-20 22:25:08