![](/img/trans.png)
[英]Applying a custom function on data.table instead of using plyr and ddply
[英]Applying Custom Function to Data.Table
我需要將自定義函數應用於data.table中的所有行,該表具有freq(數字),ngram(每個單詞用_分隔的文本)列。 我還提供了3個不變的值-input1gramCount,input2gramCount,input3gramCount,而不是在data.table中。
當我嘗試這個時,我得到警告
Warning message:
In if (MatchedLen == 4) { :
the condition has length > 1 and only the first element will be used
似乎在抱怨4不是矢量化的,但我希望它是一個常數。 任何指針歡迎...
# Stupid Backoff
StupidBackoffScore <- function(freq, ngram, input1gramCount, input2gramCount, input3gramCount) {
matchedLen = str_count(ngram, "_") + 1
if (matchedLen == 4) {
score = freq / input3gramCount
} else if (matchedLen == 3) {
score = 0.4 * freq / input2gramCount
} else {
# must be matchedLen 2
score = 0.4 * 0.4 * freq / input1gramCount
}
return(score)
}
allGrams <- allGrams %>%
mutate(stupidBOScore = StupidBackoffScore(frequency, ngram, input1gramCount, input2gramCount, input3gramCount))
我會這樣做:
setDT(dt)
dt[, matchedLen := str_count(ngram, "_") + 1 ]
dt[, score := ifelse(matchedLen == 4, freq / input3gramCount,
ifelse(matchedLen == 3, 0.4 * freq / input2gramCount,
0.4 * 0.4 * freq / input1gramCount)) ]
為了便於閱讀,我將matchedLen
創建為單獨的列。 如果不需要matchedLen
,則可以在創建樂譜后將其刪除。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.