[英]How can I make this loop run faster in R?
有大約44000字的字典數據幀words.dict,和下面的代碼應該替換數據集中dataset.num所有單詞用於從詞典它們的數字標識。
data.num:
dput(head(dataset.num))
c("rt breaking will from here forward be know as", "i hope you like wine and cocktails", "this week we are upgrading our servers there may be periodic disruptions to the housing application portal sorry for any inconvenience", "hanging out in foiachat anyone have fav management software on the gov t side anything from intake to redaction onwards", "they left out kourtney instead they let chick from big bang talk", "i am encoding film like for the billionth time already ")
words.dict:
dput(head(words.dict,20)
structure(list(id = c(10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 3L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L), word = structure(1:20, .Label =c("already", "am", "and", "any", "anyone", "anything", "application", "are", "as", "bang", "be", "big", "billionth", "breaking", "chick", "cocktails","disruptions", "encoding", "fav", "film", "foiachat", "for", "forward", "from", "gov", "hanging", "have", "here", "hope", "housing", "i", "in", "inconvenience", "instead", "intake", "know", "kourtney", "left", "let", "like", "management", "may", "on", "onwards", "our", "out", "periodic", "portal", "redaction", "rt", "servers", "side", "software", "sorry", "t", "talk", "the", "there", "they", "this", "time", "to", "upgrading", "we", "week", "will", "wine", "you"), class = "factor")), .Names = c("id", "word"), row.names = c(10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 3L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L), class = "data.frame")
環:
for (i in 1:nrow(words.dict))
dataset.num <- gsub(paste0("\\b(", words.dict[i,"word"], ")\\b"),words.dict[i,1], dataset.num)
當我截斷數據時, dataset.num是幾乎四萬行的字符向量(每行平均包含20個單詞)。 該代碼適用於小數據,但不適用於處理速度有限的大數據。
您對提高代碼的效率和性能有何建議?
這是另一種方法,雖然我還沒有真正測試過它,但是它可能會更好地擴展。
sapply(strsplit(dataset.num, "\\s+"), function(y) {
i <- match(y, words.dict$word)
y[!is.na(i)] <- words.dict$id[na.omit(i)]
paste(y, collapse = " ")
})
#[1] "rt 22 will from here forward 3 know 18"
#[2] "i hope you like wine 12 24"
#[3] "this week we 17 upgrading our servers there may 3 periodic 25 to the housing 16 portal sorry for 13 inconvenience"
#[4] "hanging out in foiachat 14 have 27 management software on the gov t side 15 from intake to redaction onwards"
#[5] "they left out kourtney instead they let 23 from 20 19 talk"
#[6] "i 11 26 28 like for the 21 time 10"
請注意,您可以使用stringi::stri_split
加快字符串拆分速度。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.