德語語言 R 中的情感分析

Question

我正在嘗試在 R 中用德語進行情緒分析。但是，output 似乎沒有希望，因為我找不到用德語進行分析的方法。

你對我有什么建議嗎？

#libraries
library(tidyverse)
library(tokenizers)
library(stopwords)
library(sentimentr)

#load data
data <- tribble(
  ~content, 
  "Nimmt euch in Acht✌️#tage #periode #blu #hände #rot #blute #wald #fy #viral",
  "ich liebe uns #wortwitze #Periode #Tage #couplegoals",
  "Mit KadeZyklus bei Krämpfen gibt es jetzt endlich ein pflanzliches Helferlein gegen leichte Unterleibskrämpfe!",
  "Es ist wie es ist Jungs"
)

# count freq of words
words_as_tokens <- setNames(lapply(sapply(data$content, 
                                          tokenize_words, 
                                          stopwords = stopwords(language = "en", source = "smart")), 
                                   function(x) as.data.frame(sort(table(x), TRUE), stringsAsFactors = F)), data$content) 

# tidyverse's job
stop_german <- data.frame(word = stopwords::stopwords("de"), stringsAsFactors = FALSE)
df <- words_as_tokens %>%
  bind_rows(, .id = "content") %>%
  rename(word = x) %>% 
  anti_join(stop_german, by = c("word"))

#sentiment
df$sentiment_score <- sapply(df$content, function(x) 
  mean(sentiment(x)$sentiment))

Answer 1

您指定了錯誤的停用詞來源和錯誤的語言。 smart as source不包含de作為語言。 如果您執行stopwords_getsources() ，您將獲得stopwords的所有可用來源。 使用stopwords_getlanguages(source = 'snowball')你會看到它包含de 。

相應地更改您的stopwords ，它將起作用。

# count freq of words
words_as_tokens <- setNames(lapply(
  sapply(data$content,
    tokenize_words,
    stopwords = stopwords(language = "de", source = "snowball")
  ),
  function(x) as.data.frame(sort(table(x), TRUE), stringsAsFactors = F)
), data$content)

德語語言 R 中的情感分析

問題描述

1 個解決方案

解決方案1
0 2022-12-04 09:21:12

德語語言 R 中的情感分析

問題描述

1 個解決方案

解決方案1 0 2022-12-04 09:21:12

解決方案1
0 2022-12-04 09:21:12