简体   繁体   中英

Save full text of a tweet with tweepy

I am a novice programmer in python. I am having troubles trying to extract the text of a series of tweets with tweepy and saving it to a text file (I ommit the authentication and stuff)

search = api.search("hello", count=10)

textlist=[]

for i in range(0,len(search)):
    textlist.append( search[i].text.replace('\n', '' ) )

f = open('temp.txt', 'w')
for i in range(0,len(idlist)):
    f.write(textlist[i].encode('utf-8') + '\n')

But in some long tweets the text at the end is truncated, and a three dot character "..." appears at the end of each string, so sometimes I lose links or hashtags. How can I avoid this?

With tweepy, you can get the full text using tweet_mode='extended' (not documented in the Tweepy doc). For instance:

(not extended)

print api.get_status('862328512405004288')._json['text']

@tousuncotefoot @equipedefrance @CreditAgricole @AntoGriezmann @KMbappe @layvinkurzawa @UmtitiSam J'ai jamais vue d… https://tco/kALZ2ki9Vc

(extended)

print api.get_status('862328512405004288', tweet_mode='extended')._json['full_text']

@tousuncotefoot @equipedefrance @CreditAgricole @AntoGriezmann @KMbappe @layvinkurzawa @UmtitiSam J'ai jamais vue de match de foot et cela ferait un beau cadeau pour mon copain !! 🙏🏻🙏🏻🙏🏻😍😍

The ... (ellipsis) are added when the tweet is part of a retweet (and thus, is truncated). This is mentioned in the documentation :

Indicates whether the value of the text parameter was truncated, for example, as a result of a retweet exceeding the 140 character Tweet length. Truncated text will end in ellipsis, like this ...

There is no way to avoid this, unless you take each individual tweet and then search any retweets of it and build the complete timeline (obviously this isn't practical for a simple search, you could do this if you were fetching a particular handle's timeline).

You can also simplify your code:

results = api.search('hello', count=10)

with open('temp.txt', 'w') as f:
   for tweet in results:
       f.write('{}\n'.format(tweet.decode('utf-8')))

This is default behaviourfor retweets. You can access the full text under the retweeted_status object.

Twitter API entities section about the change:

https://dev.twitter.com/overview/api/entities-in-twitter-objects#retweets

Twitter API documentation (look for "truncated")

https://dev.twitter.com/overview/api/tweets

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM