嵌套 for 循環到函數和 lapply

Question

我正在嘗試寫入函數並調用嵌套 for 循環的代碼。 下面的代碼我可以很容易地放在 for 循環中，我的函數也可以運行。 但我試圖避免在我的函數中使用 for 循環並使用 lapply。 如何使用 lapply 創建函數及其各自的調用代碼？

帶有 for 循環的代碼：

df <- data.frame(actual=c("reaok_oc giade_len","reaok_oc giade_len reaok_oc giade_len"),
                  Predicted = c("giade_len","reaok_oc giade_len reaok_oc giade_len"))

df[] <- lapply(df, as.character)
str(df)

all_acc<-NULL
for(s in 1:nrow(df)){
  sub_df1<-df[s,]
  actual_words<-unlist(strsplit(sub_df1$actual," "))
  all_count<-0
  for(g in 1:length(actual_words)){
    count_len<-ifelse(grep(actual_words[g],sub_df1$Predicted),1,0)
    all_count<-sum(all_count,count_len)
  }
  sub_acc<-all_count/length(actual_words)
  all_acc<-c(all_acc,sub_acc)
}

df$trans_acc<-all_acc
sensitivity=sum(df$trans_acc)/nrow(df)
sensitivity

這是使用 lapply 調用代碼函數的非工作代碼：


a1 <- function(df){
  sub_df1<-df[s,]
  actual_words<-unlist(strsplit(sub_df1$actual," "))
  all_count<-0
}

a2 <- function(df){
  count_len<-ifelse(grep(actual_words[g],sub_df1$Predicted),1,0)
  all_count<-sum(all_count,count_len)
  sub_acc<-all_count/length(actual_words)
  all_acc<-c(all_acc,sub_acc)
df$trans_acc<-all_acc
sensitivity=sum(df$trans_acc)/nrow(df)
sensitivity
}


lapply(1:nrow(df) FUN = a1, lapply(1:length(actual_words) FUN = a2, actual_words,sub_aa1))

Answer 1

在基礎 R 中，通常最好找到“向量化”（僅一個 R 函數調用）而不是“迭代”（每個元素一個調用）的解決方案。 所以例如

for(s in 1:nrow(df)){
    sub_df1<-df[s,]
    actual_words<-unlist(strsplit(sub_df1$actual," "))
    ...

涉及nrow(df)調用strsplit() ，但

actual <- strsplit(df$actual, " ")

只涉及一個但執行相同的轉換。

我也認為當你說

    for(g in 1:length(actual_words)){
        count_len<-ifelse(grep(actual_words[g],sub_df1$Predicted),1,0)
        all_count<-sum(all_count,count_len)
    }

實際上，您只是在尋找實際單詞和預測單詞之間的完全匹配。 所以你可以分割預測的單詞

predicted <- strsplit(df$Predicted, " ")

並計算sum(actual[[1]] %in% predicted[[1]]) ，依此類推。 把它寫成一個函數

actual_in_predicted <- function(actual, predicted) {
    sum(actual %in% predicted)
}

“for”循環可能會遍歷實際和預測的每個元素

all_count <- integer()
for (i in 1:nrow(df))
    all_count[[i]] <- actual_in_predicted(actual[[i]], predicted[[i]])

但最好使用mapply()來迭代actual和predicted每個元素

all_count <- mapply(actual_in_predicted, actual, predicted)

你的變量all_acc是這個數字向量除以每次比較中的實際單詞數

all_acc <- all_count / lengths(actual)

完整的修訂代碼使用一個函數來比較每行中的實際單詞和預測單詞，並使用循環遍歷每一行。

actual_in_predicted <- function(actual, predicted) {
    sum(actual %in% predicted)
}

actual <- strsplit(df$actual, " ")
predicted <- strsplit(df$Predicted, " ")

all_count <- mapply(actual_in_predicted, actual, predicted)
all_acc <- all_count / lengths(actual)

df$trans_acc <- all_acc
sensitivity <- sum(df$trans_acc) / nrow(df)

Answer 2

或許，我們可以使用separate_rows

library(dplyr)
library(tidyr)
library(stringr)
df %>%
   separate_rows(actual, sep="_") %>%
   summarise(perc = mean(str_detect(Predicted, actual)))
#  perc
#1 0.75

它可以被包裝成一個函數

f1 <- function(data, act, pred) {
   data %>%
       separate_rows({{act}}, sep="_") %>%
       summarise(perc = mean(str_detect({{pred}}, {{act}})))
 }
f1(df, actual, Predicted)
#   perc
#1 0.75

嵌套 for 循環到函數和 lapply

問題描述

2 個解決方案

解決方案1
4 已采納 2019-12-25 16:56:18

解決方案2
2 2019-12-25 15:52:21

嵌套 for 循環到函數和 lapply

問題描述

2 個解決方案

解決方案1 4 已采納 2019-12-25 16:56:18

解決方案2 2 2019-12-25 15:52:21

解決方案1
4 已采納 2019-12-25 16:56:18

解決方案2
2 2019-12-25 15:52:21