简体   繁体   English

Textblob 和情感分析:如何提炼字典?

[英]Textblob and sentiment analysis: how to refine a dictionary?

Many people use text blob for sentiment analysis on text.许多人使用文本 blob 对文本进行情感分析。 I am sure that I am missing something in understanding the approach and how to use it, but there is something that does not work at all with the results I am getting from my analysis.我确信我在理解该方法和如何使用它时遗漏了一些东西,但有些东西对我从分析中得到的结果根本不起作用。

This is an example of data that I have:这是我拥有的数据示例:

Top                                                     Text                                                   label    sentiment   polarity
51  CVD-Grown Carbon Nanotube Branches on Black Si...   silicon-carbon nanotube (bSi-CNT) hybrid struc...         -1    (-0.16666666666666666, 0.43333333333333335) -0.166667
69  Navy postpones its largest-ever Milan exercise...   Navy on Tuesday postponed a multi-nation mega ...           -1  (-0.125, 0.375) -0.125000
81 Malaysia rings alarm bell on fake Covid...   The United Nations International Children's Em...                   -1  (-0.5, 1.0) -0.500000
82  Poison Not Transmitted By Air...    it falls on the fabric remains 9 hours, so was...                   -1  (-0.2, 0.0) -0.200000
87  A WhatsApp rumor is spreading that is allegedl...   strict about unsourced speculation than other ...        -1 (-0.1, 0.1) -0.100000
90  Dumb Whatsapp Forwards - Page 2 - Cricket Web   as the ones that say like or share this pictur...          -1   (-0.375, 0.5)   -0.375000
144 malaysia | Unicef Malaysia rings alarm b... such messages claiming to be from us,” #Milan...                -1  (-0.5, 1.0) -0.500000
134 False and unverified claims are being...    Soccer was not issued by the U...                               -1  (-0.4000000000000001, 0.6)  -0.400000
123 Truth behind the Viral message about Co...  number of stories ever since the wave of misin...               -1  (-0.4, 0.7) -0.400000
166 In India, Fake WhatsApp Forwards on Coronaviru...   of confirmed cases of rises rapidl...                   -1  (-0.5, 1.0) -0.500000

I used the following algorithm:我使用了以下算法:

df['sentiment'] = df['Top'].apply(lambda Tweet: TextBlob(Tweet).sentiment)

df1=pd.DataFrame(df['sentiment'].tolist(), index= df.index)

df_new = df
df_new['polarity'] = df1['polarity']
df_new.polarity = df1.polarity.astype(float)
df_new['subjectivity'] = df1['subjectivity']
df_new.subjectivity = df1.polarity.astype(float)
# print(df_new)

conditionList = [
    df_new['polarity'] == 0,
    df_new['polarity'] > 0,
    df_new['polarity'] < 0]
choiceList = ['neutral', 'not_fake', 'fake']
df_new['label'] = np.select(conditionList, choiceList, default='no_label')

but as you can see the all these messages come from fact checking sources, so they are not fake.但正如你所看到的,所有这些消息都来自事实核查来源,所以它们不是假的。 How could I improve the results, maybe removing some specific words?我怎样才能改善结果,也许删除一些特定的词? I can see that if the text contains false, unverified, viral, fake, it is tagged as negative and this makes results even worst.我可以看到,如果文本包含虚假的、未经验证的、病毒式的、假的,它就会被标记为负面,这会使结果变得更糟。

All of your text has negative polarity, so they get labeled fake as per your code.您的所有文本都具有负极性,因此根据您的代码,它们被标记为假的。

There is no indication how that polarity field is determined, it is in the source file precalculated.没有说明该极性场是如何确定的,它在源文件中是预先计算好的。 If it is using textblob default polarity algo, what text is it running against?如果它使用 textblob 默认极性算法,它针对什么文本运行?

( Also, there may be a typo. Df_new.subjectivity is getting assigned the float cast of polarity ) 另外,可能有错字。Df_new.subjectivity 被分配了极性的浮点数

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM