如何根据 R 中的字典中的单词对 label 进行正面或负面的文本处理？

Question

Suppose I have a vector (data frame) containing comments (each row is a different comment):假设我有一个包含评论的向量（数据框）（每一行都是不同的评论）：

comment
'well done!'
'terrible work'
'quit your job'
'hi'

And I have the following data frame containing positive and negative words (ie a dictionary)我有以下包含positive和negative词的数据框（即字典）

positive negative
well     terrible
done     quit

Is there a way in R to use this dictionary to label the comments in the first data frame either positive , negative or neutral depending on whether they contain more or less positive/negative comments? R 中是否有一种方法可以使用此字典来 label 第一个数据框中的评论是positive的、 negative的还是neutral的，具体取决于它们是否包含更多或更少的正面/负面评论？

Ie I want the output to be a data frame that looks like:即我希望 output 是一个看起来像这样的数据框：

comment          label
'well done!'     positive
'terrible work'  negative
'quit your job'  negative
'hi'             neutral

Does anyone know how this can be done in R?有谁知道如何在 R 中做到这一点？

Answer 1

Does this work:这是否有效：

library(dplyr)
library(stringr)
comm %>% mutate(label = case_when(str_detect(comments, str_c(dict$positive, collapse = '|')) ~ 'positive',
                                   str_detect(comments, str_c(dict$negative, collapse = '|')) ~ 'negative',
                                   TRUE ~ 'neutral'))
       comments    label
1    well done! positive
2 terrible work negative
3 quit your job negative
4            hi  neutral

Based on OP's requirement:根据OP的要求：

comm %>% mutate(p_count = str_count(comments, str_c(dict$positive, collapse = '|')), 
                 n_count = str_count(comments, str_c(dict$negative, collapse = '|'))) %>% 
           mutate(label = case_when(p_count > n_count ~ 'positive',
                                    p_count < n_count ~ 'negative',
                                    TRUE ~ 'neutral')) %>% select(comments, label)
                 comments    label
1              well done! positive
2      terrible well work  neutral
3 quit your job well well positive
4                      hi  neutral
5      terrible quit well negative

New data used:使用的新数据：

comm
                 comments
1              well done!
2      terrible well work
3 quit your job well well
4                      hi
5      terrible quit well

dict
# A tibble: 2 x 2
  positive negative
  <chr>    <chr>   
1 well     terrible
2 done     quit

如何根据 R 中的字典中的单词对 label 进行正面或负面的文本处理？

问题描述

1 个解决方案

解决方案1
3 2021-01-07 17:01:47

如何根据 R 中的字典中的单词对 label 进行正面或负面的文本处理？

问题描述

1 个解决方案

解决方案1 3 2021-01-07 17:01:47

解决方案1
3 2021-01-07 17:01:47