简体   繁体   English

如何根据 R 中的字典中的单词对 label 进行正面或负面的文本处理?

[英]How to label text either positive or negative based on words in a dictionary in R?

Suppose I have a vector (data frame) containing comments (each row is a different comment):假设我有一个包含评论的向量(数据框)(每一行都是不同的评论):

comment
'well done!'
'terrible work'
'quit your job'
'hi'

And I have the following data frame containing positive and negative words (ie a dictionary)我有以下包含positivenegative词的数据框(即字典)

positive negative
well     terrible
done     quit

Is there a way in R to use this dictionary to label the comments in the first data frame either positive , negative or neutral depending on whether they contain more or less positive/negative comments? R 中是否有一种方法可以使用此字典来 label 第一个数据框中的评论是positive的、 negative的还是neutral的,具体取决于它们是否包含更多或更少的正面/负面评论?

Ie I want the output to be a data frame that looks like:即我希望 output 是一个看起来像这样的数据框:

comment          label
'well done!'     positive
'terrible work'  negative
'quit your job'  negative
'hi'             neutral

Does anyone know how this can be done in R?有谁知道如何在 R 中做到这一点?

Does this work:这是否有效:

library(dplyr)
library(stringr)
comm %>% mutate(label = case_when(str_detect(comments, str_c(dict$positive, collapse = '|')) ~ 'positive',
                                   str_detect(comments, str_c(dict$negative, collapse = '|')) ~ 'negative',
                                   TRUE ~ 'neutral'))
       comments    label
1    well done! positive
2 terrible work negative
3 quit your job negative
4            hi  neutral

Based on OP's requirement:根据OP的要求:

comm %>% mutate(p_count = str_count(comments, str_c(dict$positive, collapse = '|')), 
                 n_count = str_count(comments, str_c(dict$negative, collapse = '|'))) %>% 
           mutate(label = case_when(p_count > n_count ~ 'positive',
                                    p_count < n_count ~ 'negative',
                                    TRUE ~ 'neutral')) %>% select(comments, label)
                 comments    label
1              well done! positive
2      terrible well work  neutral
3 quit your job well well positive
4                      hi  neutral
5      terrible quit well negative

New data used:使用的新数据:

comm
                 comments
1              well done!
2      terrible well work
3 quit your job well well
4                      hi
5      terrible quit well

dict
# A tibble: 2 x 2
  positive negative
  <chr>    <chr>   
1 well     terrible
2 done     quit    

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何计算带有文本和日期值的.csv文件列表中特定正负词的出现频率? 在R中 - How to count frequency of specific positive/negative words from a list in a .csv file with text and date values? in R 如何使用R在Wordcloud中突出显示否定词和肯定词 - How to highlight negative and positive words in a Wordcloud using R 如何根据正负值在R中找到条件和 - How to find conditional sum in R based on positive and negative values 确定两列是否在R中包含负值和正值 - Determining if two columns contain either a negative and positive value in R 如何基于R中的词典术语列表对数据框中的单词进行计数 - how to count words in a data frame based on a list of dictionary terms in r 用R进行文本挖掘:如何查看文档中的正负情绪? - text mining with R: how to see positive-negative sentiments in my document? 如何使用一组正负一元组和二元组计算字符串中的单词? - How to count words in a string, using a set of positive and negative unigrams and bigrams? 如何使用ggplot2标记带有正负条的条形图条 - How to label a barplot bar with positive and negative bars with ggplot2 如何在r的一行数据框中找到正负计数 - How to find count of positive and negative values in a row of dataframe in r 如何计算带有正负坐标的R中的点之间的距离 - How to calculate distance between points in R with negative and positive coordinate
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM