[英]AWS sentiment analysis gives different results on same string (R)
我有一個包含聖誕歌曲歌詞的數據框,大致如下所示:
df1 <- data.frame(line = c("I don't want a lot for Christmas",
"There is just one thing I need",
"I don't care about the presents",
"Underneath the Christmas tree",
"I just want you for my own"))
我還安裝了 R package aws.comprehend
。
然后我把它變成一個長字符串:
lyrics_df1 <- df1 %>%
iconv(., from = "UTF-8", to = 'ASCII//TRANSLIT') %>%
str_c(.,collapse = " ")
當我現在運行代碼detect_sentiment(lyrics_df1)
時,output 是:
Index Sentiment Mixed Negative Neutral Positive
1 0 NEUTRAL 0.0003775794 0.291473 0.6762416 0.03190778
但是,如果我只對歌詞作為字符串運行相同的代碼,我會得到以下 output:
detect_sentiment("I don't want a lot for Christmas
There is just one thing I need
I don't care about the presents underneath the Christmas tree
I just want you for my own")
Index Sentiment Mixed Negative Neutral Positive
1 0 NEUTRAL 0.2951728 0.2238117 0.3551461 0.1258695
output 現在完全不同了!
我如何確保獲得與直接將整個歌詞粘貼到detect_sentiment()
function 中相同的結果?
當您使用第一個命令時,您將整個 data.frame 發送到函數,這導致:
df1 %>%
iconv(., from = "UTF-8", to = 'ASCII//TRANSLIT') %>%
str_c(.,collapse = " ")
[1] "c("聖誕節我不想要很多", "我只需要一件東西", "我不在乎禮物", "在聖誕樹下", "我只想要你為我自己)”
可能是添加的符號導致了分數的差異。 要將函數直接應用於變量,請使用pull
df1 %>%
pull(line) %>%
iconv(., from = "UTF-8", to = 'ASCII//TRANSLIT') %>%
str_c(.,collapse = " ")
[1] “聖誕節我不想要太多我只需要一件事我不在乎聖誕樹下的禮物我只想要你屬於我自己”
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.