用lapply替換Apply功能

Question

我正在創建一個數據集，以使用正則表達式計算單詞的不同組合的合計值。 每行都有一個唯一的正則表達式值，我要對照另一個數據集檢查該值，並找到它出現在其中的次數。

第一個數據集（df1）如下所示：

   word1    word2               pattern
   air      10     (^|\\s)air(\\s.*)?\\s10($|\\s)
 airport    20   (^|\\s)airport(\\s.*)?\\s20($|\\s)
   car      30     (^|\\s)car(\\s.*)?\\s30($|\\s)

我要從中匹配的另一個數據集（df2）看起來像

   sl_no    query
   1      air 10     
   2    airport 20   
   3    airport 20
   3    airport 20
   3      car 30

我想要的最終輸出應該像word1 word2 total_occ air 10 1 airport 20 3 car 30 1

我可以通過在R中使用Apply來做到這一點

process <- 
function(x) 
{
  length(grep(x[["pattern"]], df2$query))
}           

df1$total_occ=apply(df1,1,process)

但要花點時間，因為我的數據集很大。

我發現，“並行”包的“ mclapply”功能可用於在多核上運行此類程序，為此，我嘗試首先運行lapply。 它給我錯誤的說法

lapply(df,process)

x [，“ pattern”]中的錯誤：不正確的維數

請讓我知道要正確運行lapply應該進行哪些更改。

Answer 1

為什么不僅僅在pattern lapply() ？

在這里，我剛剛抽出了您的pattern但這可以很容易地成為df$pattern

pattern <- c("(^|\\s)air(\\s.*)?\\s10($|\\s)",
             "(^|\\s)airport(\\s.*)?\\s20($|\\s)",
             "(^|\\s)car(\\s.*)?\\s30($|\\s)")

將數據用於df2

txt <- "sl_no    query
   1      'air 10'     
   2    'airport 20'   
   3    'airport 20'
   3    'airport 20'
   3      'car 30'"
df2 <- read.table(text = txt, header = TRUE)

只需直接迭代pattern

> lapply(pattern, grep, x = df2$query)
[[1]]
[1] 1

[[2]]
[1] 2 3 4

[[3]]
[1] 5

如果您希望按照問題的建議提供更緊湊的輸出，則需要對返回的輸出運行lengths lengths() （感謝@Frank指出新的函數lengths() ））。 例如

lengths(lapply(pattern, grep, x = df2$query))

這使

> lengths(lapply(pattern, grep, x = df2$query))
[1] 1 3 1

您可以通過以下方式將其添加到原始數據中

dfnew <- cbind(df1[, 1:2],
               Count = lengths(lapply(pattern, grep, x = df2$query)))

用lapply替換Apply功能

問題描述

1 個解決方案

解決方案1
3 已采納 2015-06-17 16:07:29

用lapply替換Apply功能

問題描述

1 個解決方案

解決方案1 3 已采納 2015-06-17 16:07:29

解決方案1
3 已采納 2015-06-17 16:07:29