简体   繁体   中英

Extracting tweets through twitter using Tweepy

After sucessfully appending tweets to my csv file, I saw that the tweets were shortened and had a new text at the place where they were shortened.

For eg: original tweet looks like this

Career in Risk Management Some of the programmes and qualifications in the field are:

  1. GARP's Financial Risk Management (FRM) Certification
  2. IRM's Enterprise Risk Management (ERM) Qualification
  3. MBA/Masters in Risk Management

My tweet has a body like this: Career in Risk Management\n\nSome of the programmes and qualifications in the field are:\n\n1. GARP\xe2\x80\x99s Financial Risk Ma\xe2\x80\xa6 (add link here).

any idea how i can solve this problem?

Sharing my code here:

auth = tweepy.OAuthHandler('xxxx', 'xxxx') 
auth.set_access_token('xxxx', 'xxxx')
api = tweepy.API(auth)
search_words = "jobs"      #enter your words
new_search = search_words + " -filter:retweets"
csvFile = open('jobs.csv', 'a')
csvWriter = csv.writer(csvFile)
for tweet in tweepy.Cursor(api.search,q=new_search,count=100,lang="en",since_id=0).items():
        csvWriter.writerow([tweet.created_at,tweet.text.encode('utf8'), tweet.user.screen_name.encode('utf-8'), tweet.favorite_count, tweet.retweet_count,tweet.truncated,tweet.user.location.encode('utf-8'), tweet.source])

So what's happening here is you're also catching the special characters, \n is a common one and is simply a line break, the way I thought of first is with the.split() function, but that splits the string into an array, though it does delete the character, then I found the.replace() function that would look like this to get rid of the line break characters:

tweetToCut.replace('\n', '')

That would get rid of the line breaks, though you'd have to do this with every character, but you can chain them so it'd look like:

tweetToCut.replace('\n', '').replace('\xe2', '')

Though unless if you just want the text of the tweet, the characters you want to remove are required for the formatting of the tweet, so if you intend to just use the text, you're good to remove them, but if you do want the formatting I recommend you keep those characters unless you want to reformat the tweets.

Looks like you're using standard Tweets and not handling extended (longer than 140) Tweets:

for tweet in tweepy.Cursor(api.search,q=new_search,count=100,lang="en",tweet_mode=“extended”, since_id=0).items():

You'll also need to get tweet.full_text instead of tweet.text in the CSV storage line.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM