AWS 情緒分析在同一字符串 (R) 上給出不同的結果

Question

我有一個包含聖誕歌曲歌詞的數據框，大致如下所示：

df1 <- data.frame(line = c("I don't want a lot for Christmas", 
                           "There is just one thing I need", 
                           "I don't care about the presents", 
                           "Underneath the Christmas tree", 
                           "I just want you for my own"))

我還安裝了 R package aws.comprehend 。

然后我把它變成一個長字符串：

lyrics_df1 <- df1 %>% 
  iconv(., from = "UTF-8", to = 'ASCII//TRANSLIT') %>% 
  str_c(.,collapse = " ")

當我現在運行代碼detect_sentiment(lyrics_df1)時，output 是：

  Index Sentiment        Mixed Negative   Neutral   Positive
1     0   NEUTRAL 0.0003775794 0.291473 0.6762416 0.03190778

但是，如果我只對歌詞作為字符串運行相同的代碼，我會得到以下 output：

detect_sentiment("I don't want a lot for Christmas
There is just one thing I need
I don't care about the presents underneath the Christmas tree
I just want you for my own")

  Index Sentiment     Mixed  Negative   Neutral  Positive
1     0   NEUTRAL 0.2951728 0.2238117 0.3551461 0.1258695

output 現在完全不同了！

我如何確保獲得與直接將整個歌詞粘貼到detect_sentiment() function 中相同的結果？

Answer 1

當您使用第一個命令時，您將整個 data.frame 發送到函數，這導致：

df1 %>% 
  iconv(., from = "UTF-8", to = 'ASCII//TRANSLIT') %>% 
  str_c(.,collapse = " ")

[1] "c("聖誕節我不想要很多", "我只需要一件東西", "我不在乎禮物", "在聖誕樹下", "我只想要你為我自己）”

可能是添加的符號導致了分數的差異。 要將函數直接應用於變量，請使用pull

df1 %>% 
  pull(line) %>% 
  iconv(., from = "UTF-8", to = 'ASCII//TRANSLIT') %>% 
  str_c(.,collapse = " ")

[1] “聖誕節我不想要太多我只需要一件事我不在乎聖誕樹下的禮物我只想要你屬於我自己”

AWS 情緒分析在同一字符串 (R) 上給出不同的結果

問題描述

1 個解決方案

解決方案1
2 2022-12-11 17:14:39

AWS 情緒分析在同一字符串 (R) 上給出不同的結果

問題描述

1 個解決方案

解決方案1 2 2022-12-11 17:14:39

解決方案1
2 2022-12-11 17:14:39