如何使用我自己的詞典詞典分析R中的句子？

Question

我已經形成了一個新的詞典詞典來分析R中句子的情感。在使用R之前我已經使用過詞典詞典，但是我不確定如何使用自己的詞典。 我設法創建了正面和負面的單詞列表，該列表計算正面和負面單詞的數量，然后提供一個總和。 如下例所示，這並未考慮分配給每個單詞的分數。

我想分析說這句話“我很高興，有點傷心”。 單詞和分數列表示例（列表將比這個更大）：

happy, 1.3455
sad, -1.0552

我想將這些單詞與句子匹配，並獲得總分1.3455 + -1.0552，在這種情況下，總分為0.2903。

如上例中所強調的，在分析R中每個句子的情感時，我將如何使用每個單詞的實際分數來提供總體分數？

非常感謝，詹姆斯

Answer 1

您可以從宏偉的tidytext包開始：

library(tidytext)
library(tidyverse)

首先，對您的數據進行分析，並進行一些小的轉換：

# data
df <-data_frame(text = c('I am happy and kind of sad','sad is sad, happy is good'))

# add and ID
df <- tibble::rowid_to_column(df, "ID")

# add the name of the ID column
colnames(df)[1] <- "line"

> df
# A tibble: 1 x 2
   line text                      
  <int> <chr>                     
1     1 I am happy and kind of sad

然后，您可以使他們在專欄中做單詞。 這是一個應用於每個句子（每個id）的“循環”：

 tidy <- df %>% unnest_tokens(word, text)
    > tidy
# A tibble: 7 x 2
   line word 
  <int> <chr>
1     1 i    
2     1 am   
3     1 happy
4     1 and  
5     1 kind 
6     1 of   
7     1 sad

現在您的全新詞典：

lexicon <- data_frame(word =c('happy','sad'),scores=c(1.3455,-1.0552))
> lexicon
# A tibble: 2 x 2
  word  scores
  <chr>  <dbl>
1 happy   1.35
2 sad    -1.06

最后，您可以merge詞典和數據以得到分數的總和。

merged <- merge(tidy,lexicon, by = 'word')

現在，對於每個短語，情緒：

scoredf <- aggregate(cbind(scores) ~line, data = merged, sum)
>scoredf
  line  scores
1    1  0.2903
2    2 -0.7649

最后，您可以merge初始df與樂譜merge ，將短語和樂譜合並在一起：

scoredf <- aggregate(cbind(scores) ~line, data = merged, sum)
merge(df,scoredf, by ='line')
  line                       text  scores
1    1 I am happy and kind of sad  0.2903
2    2  sad is sad, happy is good -0.7649

如果您想要多個短語，則總體情感得分。
希望能幫助到你！

如何使用我自己的詞典詞典分析R中的句子？

問題描述

1 個解決方案

解決方案1
2 已采納 2018-07-13 07:33:12

如何使用我自己的詞典詞典分析R中的句子？

問題描述

1 個解決方案

解決方案1 2 已采納 2018-07-13 07:33:12

解決方案1
2 已采納 2018-07-13 07:33:12