簡體   English   中英

擴展 Twitter 情緒分析

[英]Expanding Twitter sentiment analysis

下面的代碼分析 Twitter 情緒:無論是積極的、消極的還是中性的。 但是,對於許多推文來說,這是相當不准確的,例如,如果它包含“有人給了他一個中指 saulte”,我想訓練程序識別中指意味着不尊重,即使它在句子中包含了敬禮這個詞。

任何建議將不勝感激。

import re import tweepy from tweepy import OAuthHandler from textblob import TextBlob

class TwitterClient(object):
    '''
    Generic Twitter Class for sentiment analysis.
    '''
    def __init__(self):
        '''
        Class constructor or initialization method.
        '''
        # keys and tokens from the Twitter Dev Console
        consumer_key = 'WHexAxkRn6uEJkzS2CKpeQejI'
        consumer_secret = 'fSxjGVM247YS6Y6BpkWXaIfr6ThXdoSUg2y0aR259vNXVPPfob'
        access_token = '915324744140025862-jnGvcTPkJHOObkeydiVburK8SdAngEk'
        access_token_secret = 'JGgkWI9Lq0rJU1K0C8JLplRnSrEuw8pj3anOlIsn3YdiO'


        # attempt authentication
        try:
            # create OAuthHandler object
            self.auth = OAuthHandler(consumer_key, consumer_secret)
            # set access token and secret
            self.auth.set_access_token(access_token, access_token_secret)
            # create tweepy API object to fetch tweets
            self.api = tweepy.API(self.auth)
        except:
            print("Error: Authentication Failed")

    def clean_tweet(self, tweet):
        '''
        Utility function to clean tweet text by removing links, special characters
        using simple regex statements.
        '''
        return ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", tweet).split())

    def get_tweet_sentiment(self, tweet):
        '''
        Utility function to classify sentiment of passed tweet
        using textblob's sentiment method
        '''
        # create TextBlob object of passed tweet text
        analysis = TextBlob(self.clean_tweet(tweet))
        # set sentiment
        if analysis.sentiment.polarity > 0:
            return 'positive'
        elif analysis.sentiment.polarity == 0:
            return 'neutral'
        else:
            return 'negative'

    def get_tweets(self, query, count = 30):
        '''
        Main function to fetch tweets and parse them.
        '''
        # empty list to store parsed tweets
        tweets = []

        try:
            # call twitter api to fetch tweets
            fetched_tweets = self.api.search(q = query, count = count)

            # parsing tweets one by one
            for tweet in fetched_tweets:
                # empty dictionary to store required params of a tweet
                parsed_tweet = {}

                # saving text of tweet
                parsed_tweet['text'] = tweet.text
                # saving sentiment of tweet
                parsed_tweet['sentiment'] = self.get_tweet_sentiment(tweet.text)

                # appending parsed tweet to tweets list
                if tweet.retweet_count > 0:
                    # if tweet has retweets, ensure that it is appended only once
                    if parsed_tweet not in tweets:
                        tweets.append(parsed_tweet)
                else:
                    tweets.append(parsed_tweet)

            # return parsed tweets
            return tweets

        except tweepy.TweepError as e:
            # print error (if any)
            print("Error : " + str(e))

def main():
    # creating object of TwitterClient Class
    api = TwitterClient()
    # calling function to get tweets
    tweets = api.get_tweets(query = 'Donald Trump', count = 200)

    # picking positive tweets from tweets
    ptweets = [tweet for tweet in tweets if tweet['sentiment'] == 'positive']
    # percentage of positive tweets
    print("Positive tweets percentage: {} %".format(100*len(ptweets)/len(tweets)))
    # picking negative tweets from tweets
    ntweets = [tweet for tweet in tweets if tweet['sentiment'] == 'negative']
    # percentage of negative tweets
    print("Negative tweets percentage: {} %".format(100*len(ntweets)/len(tweets)))
    # percentage of neutral tweets
    print("Neutral tweets percentage:{}%".format(100*(len(tweets) - len(ntweets) - len(ptweets))/len(tweets)))

    # printing first 5 positive tweets
    print("\n\nPositive tweets:")
    for tweet in ptweets[:20]:
        print(tweet['text'])

    # printing first 5 negative tweets
    print("\n\nNegative tweets:")
    for tweet in ntweets[:20]:
        print(tweet['text'])

if __name__ == "__main__":
    # calling main function
    main()

該算法不遵循機器學習中使用的任何分類程序,因此無需訓練 它是一種基於非常基本的統計程序的算法,要執行它,需要有預先按感覺分類的詞袋(正面詞袋和負面詞袋)。

通過遵循這樣一個基本的統計程序,將單詞分類為中性詞是極其復雜的。 這就是為什么您的算法效果不佳的原因。

另外,輸入一個大於 0 的轉發率

if tweet.retweet_count> 0 :

但如果沒有衡量一段時間內轉發的比例,那沒有任何意義。

因此,您的算法很難運行良好。 我建議對員工詞袋詞排名和標記化進行更多研究。

您可以查看此鏈接了解詳情: https : //www.pluralsight.com/guides/building-a-twitter-sentiment-analysis-in-python

成功和問候。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM