简体   繁体   English

为什么TextBlob不使用/检测否定?

[英]Why is not TextBlob using / detecting the negation?

I am using TextBlob to perform a sentiment analysis task. 我正在使用TextBlob执行情感分析任务。 I have noticed that TextBlob is able to detect the negation in some cases while in other cases not. 我注意到TextBlob在某些情况下能够检测到否定,而在其他情况下则不能。

Here are two simple examples 这是两个简单的例子

>>> from textblob.sentiments import PatternAnalyzer

>>> sentiment_analyzer = PatternAnalyzer()
# example 1
>>> sentiment_analyzer.analyze('This is good')
Sentiment(polarity=0.7, subjectivity=0.6000000000000001)

>>> sentiment_analyzer.analyze('This is not good')
Sentiment(polarity=-0.35, subjectivity=0.6000000000000001)

# example 2
>>> sentiment_analyzer.analyze('I am the best')
Sentiment(polarity=1.0, subjectivity=0.3)

>>> sentiment_analyzer.analyze('I am not the best')  
Sentiment(polarity=1.0, subjectivity=0.3)

As you can see in the second example when using the adjective best the polarity is not changing. 正如您在第二个示例中看到的那样,当best使用形容词时,极性不变。 I suspect that has to do with the fact that the adjective best is a very strong indicator, but doesn't seem right because the negation should have reversed the polarity (in my understanding). 我怀疑这与以下事实有关: best形容词是一个非常有力的指标,但似乎不正确,因为否定应该颠倒了极性(据我所知)。

Can anyone explain a little bit what's going? 任何人都可以解释一下情况吗? Is textblob using some negation mechanism at all or is it just that the word not is adding negative sentiment to the sentence? textblob是否完全使用某种否定机制,或者仅仅是单词一词not在句子中添加负面情绪? In either case, why does the second example has exactly the same sentiment in both cases? 在这两种情况下,为什么第二种示例在两种情况下的情绪完全相同? Is there any suggestion about how to overcome such obstacles? 关于如何克服这些障碍,有什么建议吗?

(edit: my old answer was more about general classifiers and not about PatternAnalyzer) (编辑:我的旧答案更多是关于通用分类器,而不是关于PatternAnalyzer)

TextBlob uses in your code the "PatternAnalyzer". TextBlob在您的代码中使用“ PatternAnalyzer”。 Its behaviour is briefly discribed in that document: http://www.clips.ua.ac.be/pages/pattern-en#parser 该文件中简要介绍了它的行为: http ://www.clips.ua.ac.be/pages/pattern-en#parser

We can see that: 我们可以看到:

The pattern.en module bundles a lexicon of adjectives (eg, good, bad, amazing, irritating, ...) that occur frequently in product reviews, annotated with scores for sentiment polarity (positive ↔ negative) and subjectivity (objective ↔ subjective). pattern.en模块捆绑了在产品评论中经常出现的形容词词典 (例如,好,坏,令人惊讶,令人讨厌...), 并用情感极性 (正↔负)和主观性(客观↔主观)的分数来注释。

The sentiment() function returns a (polarity, subjectivity)-tuple for the given sentence, based on the adjectives it contains , 基于包含的形容词, sentiment()函数针对给定的句子返回一个(极性,主观性)元组,

Here's an example that shows the behaviour of the algorithm. 这是显示该算法行为的示例。 The polarity directly depends on the adjective used. 极性直接取决于所使用的形容词。

sentiment_analyzer.analyze('player')
Sentiment(polarity=0.0, subjectivity=0.0)

sentiment_analyzer.analyze('bad player')
Sentiment(polarity=-0.6999998, subjectivity=0.66666)

sentiment_analyzer.analyze('worst player')
Sentiment(polarity=-1.0, subjectivity=1.0)

sentiment_analyzer.analyze('best player')
Sentiment(polarity=1.0, subjectivity=0.3)

Professionnal softwares generally use complex tools based on neural networks and classifiers combined with lexical analysis. 专业软件通常使用基于神经网络和分类器并结合词法分析的复杂工具。 But for me, TextBlob just tries to give a result based on a direct result from the grammar analysis (here, the polarity of the adjectives). 但是对我来说,TextBlob只是尝试根据语法分析直接结果 (在这里是形容词的极性)给出结果。 It's the source of the problem. 这是问题的根源。

It does not try to check if the general sentence is negative or not (with the "not" word). 它不会尝试检查一般句子是否为负 (带有“ not”字样)。 It tries to check if the adjective is negated or not (as it works only with adjective, not with the general structure). 它尝试检查形容词是否被否定 (因为它仅适用于形容词,不适用于一般结构)。 Here, best is used as a noun and is not a negated adjective. 在这里,best被用作名词,而不是否定形容词。 So, the polarity is positive. 因此,极性为正极。

sentiment_analyzer.analyze('not the best')
Sentiment(polarity=1.0, subjectivity=0.3)

Just remplace the order of the words to make negation over the adjective and not the whole sentence. 只需替换单词的顺序即可否定形容词而不是整个句子。

sentiment_analyzer.analyze('the not best')
Sentiment(polarity=-0.5, subjectivity=0.3)

Here, the adjective is negated. 在这里,形容词是否定的。 So, the polarity is negative. 因此,极性为负极。 It's my explaination of that "strange behaviour". 这是我对“奇怪行为”的解释。


The real implementation is defined in file: https://github.com/sloria/TextBlob/blob/dev/textblob/_text.py 真正的实现在文件中定义: https : //github.com/sloria/TextBlob/blob/dev/textblob/_text.py

The interresing portion is given by: 交叉部分由下式给出:

if w in self and pos in self[w]:
    p, s, i = self[w][pos]
    # Known word not preceded by a modifier ("good").
    if m is None:
        a.append(dict(w=[w], p=p, s=s, i=i, n=1, x=self.labeler.get(w)))
    # Known word preceded by a modifier ("really good").

    ...


else:
    # Unknown word may be a negation ("not good").
    if negation and w in self.negations:
        n = w
    # Unknown word. Retain negation across small words ("not a good").
    elif n and len(w.strip("'")) > 1:
        n = None
    # Unknown word may be a negation preceded by a modifier ("really not good").
    if n is not None and m is not None and (pos in self.modifiers or self.modifier(m[0])):
        a[-1]["w"].append(n)
        a[-1]["n"] = -1
        n = None
    # Unknown word. Retain modifier across small words ("really is a good").
    elif m and len(w) > 2:
        m = None
    # Exclamation marks boost previous word.
    if w == "!" and len(a) > 0:

    ...

If we enter "not a good" or "not the good", it will match the else part because it's not a single adjective. 如果我们输入“不好”或“不好”,它将与其他部分匹配,因为它不是单个形容词。

The "not a good" part will match elif n and len(w.strip("'")) > 1: so it will reverse polarity. “不好的”部分将匹配elif n and len(w.strip("'")) > 1:因此它将反转极性。 not the good will not match any pattern, so, the polarity will be the same of "best". not the good将不会匹配任何模式,因此,极性将与“最佳”相同。

The entire code is a succession of fine tweaking, grammar indictions (such as adding ! increases polarity, adding a smiley indicates irony, ...). 整个代码是一连串的细微调整,语法指示(例如添加!会增加极性,添加笑脸则表示反讽,...)。 It's why some particular patterns will give strange results. 这就是某些特定模式会产生奇怪结果的原因。 To handle each specific case, you must check if your sentence will match any of the if sentences in that part of the code. 要处理每种特定情况,您必须检查您的句子是否与代码那部分中的任何if句子匹配。

I hope I'll help 希望我能帮上忙

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM