简体   繁体   English

Beautifulsoup for loop不能获取所有元素

[英]Beautifulsoup for loop doesn't get all the elements

I want to scrape tweets from some twitter posts, I use for that BeautifulSoop library.what I want to do is to get the original post and all the replies if there is any replies (but all of them). 我想从一些Twitter帖子中抓取推文,用于BeautifulSoop库。我要做的是获取原始帖子和所有回复(如果有任何回复)(但全部)。 I managed to get the original post, and I wrote this loop to get me all the replies, but it returns me only the first one. 我设法得到了原始帖子,并编写了此循环以获取所有答复,但仅返回了第一条。 Any help please thanks ! 任何帮助,请谢谢!

from bs4 import BeautifulSoup
import urllib.request

url= "https://twitter.com/20Minutes/status/692778440211169280"

list_Original_message =[]

readfile=urllib.request.urlopen(url).read()
soup = BeautifulSoup(readfile)

# ..... the first part of my script is set to scrape the original post, I omit it # because it works!

# loop to get the replies :

replies = soup.find_all('ol',{"class":'stream-items js-navigable-stream'})
for m in replies :
    name = m.findAll('strong',class_="fullname js-action-profile-name show-popup-with-id")[0]
    print(name.string)
    profile = m.findAll('span',class_="username js-action-profile-name")[0]
    print(profile.get_text())
    link = m.findAll('a',class_="tweet-timestamp js-permalink js-nav js-tooltip")[0]['href']
    print('https://twitter.com'+link)
    time = m.findAll('a',class_="tweet-timestamp js-permalink js-nav js-tooltip")[0]['title']
    print(time)
    message = m.findAll('p',class_="TweetTextSize js-tweet-text tweet-text")[0]
    print (message.get_text())

This is the result I get, only the first reply : 这是我得到的结果,只有第一个回复:

Mais l'eau dit Mais l'eau dit

@Queen_MeloMau @Queen_MeloMau

https://twitter.com/Queen_MeloMau/status/692797851139710978 https://twitter.com/Queen_MeloMau/status/692797851139710978

11:54 AM - 28 Jan 2016 2016年1月28日上午11:54

@20Minutes dites moi que c'est une blagounette la @slavicdelrey @ 20Minutes dites moi que c'est une blagounette la @slavicdelrey

Only the first (few) tweets are actually sent to you to your original request, the rest gets loaded asynchronously. 实际上,只有前(很少)条推文实际上是发送给您的原始请求,其余的则异步加载。 Use the Twitter APIs, they're there for a reason. 使用Twitter API是有原因的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM