简体   繁体   English

如何用TextBlob和Python对标题进行情感分析

[英]How to do sentiment analysis of headlines with TextBlob and Python

I want to calculate the polarity and subjectivity for some headlines that I have.我想计算我拥有的一些头条新闻的极性和主观性。 My code works fine, it does not gives any error but for some rows it gives result 0.00000 for polarity and subjectivity.我的代码工作正常,它没有给出任何错误,但对于某些行,它给出了极性和主观性的结果 0.00000。 Do you know why?你知道为什么吗?

You can download the data form here:您可以在此处下载数据表:

https://www.sendspace.com/file/e8w4tw https://www.sendspace.com/file/e8w4tw

Am I doing something wrong?难道我做错了什么? This is the code:这是代码:

import pandas as pd
from textblob import TextBlob

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

df = pd.read_excel('coca cola news.xlsx', encoding='utf8')

df = df.dropna().reset_index(drop = True)
df = df.drop_duplicates().reset_index(drop = True)
print(df)

head_sentiment = []
head_subj = []

par_sentiment = []
par_subj = []


df['Headline Sentiment'] =  df['Headline'].apply(lambda text: TextBlob(text).sentiment.polarity).round(4)
df['Headline Subjectivity'] =  df['Headline'].apply(lambda text: TextBlob(text).sentiment.subjectivity).round(4)

df['Paragraph Sentiment'] =  df['Paragraph'].apply(lambda text: TextBlob(text).sentiment.polarity).round(4)
df['Paragraph Subjectivity'] =  df['Paragraph'].apply(lambda text: TextBlob(text).sentiment.subjectivity).round(4)

print(df)

print(df[df.columns[-4:]])

I mean, I know that 0 is possible result, but Im getting 0.0000 in 40%-50% of rows, thats a lot, not even 0.00001, that seams strange to me.我的意思是,我知道 0 是可能的结果,但是我在 40%-50% 的行中得到 0.0000,这很多,甚至不是 0.00001,这对我来说很奇怪。

Can you help me?你能帮助我吗?

its sometimes happen.它有时会发生。 Try to use polarity method from polyglot.尝试使用 polyglot 中的极性方法。 https://polyglot.readthedocs.io/en/latest/Installation.html https://polyglot.readthedocs.io/en/latest/Installation.html

and compare results.并比较结果。 Firstly you should make some preprocessing like:首先,您应该进行一些预处理,例如:

  • remove stopwords删除停用词
  • remove numbers, html links, numbers, special characters etc删除数字、html 链接、数字、特殊字符等

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM