I am having trouble doing a sentiment analysis of tweets (file 1, standard twitter json response) against a list of words (file 2, tab delimited, two columns) with their sentiment assigned to them (either positive or negative).
The problem is: the top loop is only running once and then the script ends while I am looping through file 1 then nested within that I am looping through file 2 and trying to compare and keep a running sum of the combined sentiment for each tweet.
so i have:
def get_sentiments(tweet_file, sentiment_file):
sent_score = 0
for line in tweet_file:
document = json.loads(line)
tweets = document.get('text')
if tweets != None:
tweet = str(tweets.encode('utf-8'))
#print tweet
for z in sentiment_file:
line = z.split('\t')
word = line[0].strip()
score = int(line[1].rstrip('\n').strip())
#print score
if word in tweet:
print "+++++++++++++++++++++++++++++++++++++++"
print word, tweet
sent_score += score
print "====", sent_score, "====="
#PROBLEM, IT'S ONLY DOING THIS FOR THE FIRST TWEET
file1 = open(tweetsfile.txt)
file2 = open(sentimentfile.txt)
get_sentiments(file1, file2)
I've spent the better half of a day trying to figure out why it prints out all the tweets without the nested for loop for file2, but with it, it only processes the first tweet then exits.
The reason its only doing it once is that the for loop has reached the end of the file, so it stops since there are no more lines to read.
In other words, the first time your loop runs, it steps through the entire file, and then since there are no more lines to read (since its reached the end of the file), it doesn't loop again, resulting in only one line being processed.
So one way to solve this is to "rewind" the file, you can do that with the seek
method of the file object.
If your files aren't big, another approach is to read them all into a list or similar structure and then loop through it.
However, since your sentiment score is a simple lookup, the best approach would be to build a dictionary with the sentiment scores, then lookup each word in the dictionary to calculate the overall sentiment of the tweet:
import csv
import json
scores = {} # empty dictionary to store scores for each word
with open('sentimentfile.txt') as f:
reader = csv.reader(f, delimiter='\t')
for row in reader:
scores[row[0].strip()] = int(row[1].strip())
with open('tweetsfile.txt') as f:
for line in f:
tweet = json.loads(line)
text = tweet.get('text','').encode('utf-8')
if text:
total_sentiment = sum(scores.get(word,0) for word in text.split())
print("{}: {}".format(text,score))
The with statement
automatically closes file handlers. I am using the csv
module to read the file (it works for tab delimited files as well).
This line does the calculation:
total_sentiment = sum(scores.get(word,0) for word in text.split())
It is a shorter way to write this loop:
tweet_score = []
for word in text.split():
if word in scores:
tweet_score[word] = scores[word]
total_score = sum(tweet_score)
The get
method of dictionaries takes a second optional argument to return a custom value when the key cannot be found; if you omit this second argument, it will return None
. In my loop I am using it to return 0 if the word has no score.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.