简体   繁体   中英

what is the accurate Twitter sentiment analysis solution with Python?

I have a CSV file of 20K tweets with all information such as location, username, and date which I want to assign a label positive/neutral/negative to each tweet by Python. I used the following Python code from textblob library for Tweets Sentiment Analysis .

import csv
from textblob import TextBlob
import sys

# Do some version specific stuff
if sys.version[0] == '3':
    from importlib import reload
    sntTweets = csv.writer(open("sentimentTweets.csv", "w", newline=''))

if sys.version[0] == '2':
    reload(sys)
    sys.setdefaultencoding("utf-8")
    sntTweets = csv.writer(open("sentimentTweets.csv", "w"))

alltweets = csv.reader(open("Corona.csv", 'r'))

for row in alltweets:
    blob = TextBlob(row[2])
    print (blob.sentiment.polarity)
    if blob.sentiment.polarity > 0:
        sntTweets.writerow([row[0], row[1], row[2], row[3], blob.sentiment.polarity, "positive"])
    elif blob.sentiment.polarity < 0:
        sntTweets.writerow([row[0], row[1], row[2], row[3], blob.sentiment.polarity, "negative"])
    elif blob.sentiment.polarity == 0.0:
        sntTweets.writerow([row[0], row[1], row[2], row[3], blob.sentiment.polarity, "neutral"])

this code runs perfect and produces the sentimentTweets.csv file. I like the idea that for each tweet, it gives me two labels: a number between -1 and 1, and also classify tweet to negative/neutral/positive.

but it is not accurate. for example for the following tweet, it assigns positive with the number:0.285714285714285. "RT @eliyudin: “I’ll have a Corona... hold the virus!” -a dad on vacation somewhere in Florida right now"
but as you can understand, the sentiment of the above tweet should be negative. How can I make it accurate? and how can I find the accuracy of my output?

TextBlob estimates the polarity based on the polarity of words and chunks of the input (code here: https://github.com/sloria/TextBlob/blob/e6cd9791ae42e37b5a2132676f9ca69340e8d8c0/textblob/_text.py#L854 ). Such an approach can get easily confused on noisy texts like Tweets. This is quite hard to improve because it depends on the quality of the underlying language resources.

I would suggest using a fully machine-learned model such as Flair:

import flair
flair_sentiment = flair.models.TextClassifier.load('en-sentiment')
s = flair.data.Sentence(sentence)
flair_sentiment.predict(s)
total_sentiment = s.labels

It should be also easy to train your model in FastText: https://github.com/charlesmalafosse/FastText-sentiment-analysis-for-tweets

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM