简体   繁体   English

使用tweepy保存推文的全文

[英]Save full text of a tweet with tweepy

I am a novice programmer in python. 我是python的新手程序员。 I am having troubles trying to extract the text of a series of tweets with tweepy and saving it to a text file (I ommit the authentication and stuff) 我在尝试使用tweepy提取一系列推文的文本并将其保存到文本文件时遇到麻烦(我省略了身份验证和内容)

search = api.search("hello", count=10)

textlist=[]

for i in range(0,len(search)):
    textlist.append( search[i].text.replace('\n', '' ) )

f = open('temp.txt', 'w')
for i in range(0,len(idlist)):
    f.write(textlist[i].encode('utf-8') + '\n')

But in some long tweets the text at the end is truncated, and a three dot character "..." appears at the end of each string, so sometimes I lose links or hashtags. 但是在一些长推文中,末尾的文本被截断,并且在每个字符串的末尾出现了三个点字符“ ...”,因此有时我会丢失链接或主题标签。 How can I avoid this? 如何避免这种情况?

With tweepy, you can get the full text using tweet_mode='extended' (not documented in the Tweepy doc). 使用tweepy,您可以使用tweet_mode='extended' (Tweepy文档中未记录)获取全文。 For instance: 例如:

(not extended) (不扩展)

print api.get_status('862328512405004288')._json['text']

@tousuncotefoot @equipedefrance @CreditAgricole @AntoGriezmann @KMbappe @layvinkurzawa @UmtitiSam J'ai jamais vue d… https://tco/kALZ2ki9Vc @tousuncotefoot @equipedefrance @CreditAgricole @AntoGriezmann @KMbappe @layvinkurzawa @UmtitiSam J'ai jamais vue d… https:// tco / kALZ2ki9Vc

(extended) (扩展)

print api.get_status('862328512405004288', tweet_mode='extended')._json['full_text']

@tousuncotefoot @equipedefrance @CreditAgricole @AntoGriezmann @KMbappe @layvinkurzawa @UmtitiSam J'ai jamais vue de match de foot et cela ferait un beau cadeau pour mon copain !! @tousuncotefoot @equipedefrance @CreditAgricole @AntoGriezmann @KMbappe @layvinkurzawa @UmtitiSam J'ai jamais vue de match de foot et cela ferait un beau cadeau pour mon copain !! 🙏🏻🙏🏻🙏🏻😍😍 🙏🏻🙏🏻🙏🏻😍😍

The ... (ellipsis) are added when the tweet is part of a retweet (and thus, is truncated). 当推文是转推的一部分(并因此被截断)时,添加... (省略号)。 This is mentioned in the documentation : 文档中提到了这一点:

Indicates whether the value of the text parameter was truncated, for example, as a result of a retweet exceeding the 140 character Tweet length. 指示是否例如由于转发超过140个字符的Tweet长度而将text参数的值截断了。 Truncated text will end in ellipsis, like this ... 截断的文本将以省略号结尾,像这样...

There is no way to avoid this, unless you take each individual tweet and then search any retweets of it and build the complete timeline (obviously this isn't practical for a simple search, you could do this if you were fetching a particular handle's timeline). 没有办法避免这种情况,除非您获取每个单独的tweet,然后搜索其中的任何tweet,并构建完整的时间轴(显然,这对于简单的搜索而言不切实际,如果您要获取特定句柄的时间轴,则可以这样做)。

You can also simplify your code: 您还可以简化代码:

results = api.search('hello', count=10)

with open('temp.txt', 'w') as f:
   for tweet in results:
       f.write('{}\n'.format(tweet.decode('utf-8')))

This is default behaviourfor retweets. 这是转推的默认行为。 You can access the full text under the retweeted_status object. 您可以在retweeted_status对象下访问全文。

Twitter API entities section about the change: Twitter API实体部分中的更改:

https://dev.twitter.com/overview/api/entities-in-twitter-objects#retweets https://dev.twitter.com/overview/api/entities-in-twitter-objects#retweets

Twitter API documentation (look for "truncated") Twitter API文档(查找“截断的”)

https://dev.twitter.com/overview/api/tweets https://dev.twitter.com/overview/api/tweets

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM