简体   繁体   English

如何在该网站上抓取动态生成的数据?

[英]How to scrape dynamically generated data in this website?

Here is the website I want to scrape http://www.quickbid.com.tw/ 这是我要抓取的网站http://www.quickbid.com.tw/

I wish I can get class="timestamp" into a variable in python so that I can parse "timestamp" in the way I like. 我希望可以将class =“ timestamp”放入python中的变量中,以便我可以按自己喜欢的方式解析“ timestamp”。

I've tried using scrapy to scrape "timestamp", but because scrapy does not support javascript-generated data, I cannot get it. 我曾尝试使用scrapy抓取“时间戳记”,但是因为scrapy不支持javascript生成的数据,所以无法获取它。

I also tried using firebug to monitor the packets transmitted between "quickbid" and my browser. 我还尝试使用Firebug监视“快速出价”和我的浏览器之间传输的数据包。 I found that there are packets being transmitted every second in order to synchronize the timestamp. 我发现每秒有一些数据包正在传输,以便同步时间戳。 But I still don't know how these packets are generated. 但是我仍然不知道这些数据包是如何产生的。 I heard that maybe Selenium can help me reach my goal. 我听说硒可以帮助我达到目标。 But after reading the tutorials of Selenium( http://www.jroller.com/selenium/ ), I still can't get clues of how to scrape the data I want. 但是在阅读了Selenium( http://www.jroller.com/selenium/ )的教程之后,我仍然无法获得有关如何抓取所需数据的线索。

Does anyone know how to scrape data from this website? 有人知道如何从该网站抓取数据吗? Any help will be appreciated. 任何帮助将不胜感激。

I usually use basic requests and BeautifulSoup libraries for scrapping. 我通常使用基本请求和BeautifulSoup库进行抓取。 I did this: 我这样做:

import requests
from bs4 import BeautifulSoup
r = requests.get("http://www.quickbid.com.tw/")
c = r.content
soup = BeautifulSoup(c,'html.parser')
timestanp = soup.findAll('span',{'class':'timestamp'})
print timestanp

it returned: 它返回:

[<span class="timestamp">Save91%</span>, <span class="timestamp">Save84%</span>, <span class="timestamp">Save96%</span>, <span class="timestamp">Save99%</span>, <span class="timestamp">Save82%</span>]

Hope this is what you are looking for. 希望这是您想要的。

You can definitely do it with Selenium. 您绝对可以使用Selenium做到这一点。 It would be quite easy in fact. 实际上,这将非常容易。 Selenium have plugins for many different programming languages, so just choose the one you know better and read Selenium documentation for that specific language. Selenium具有适用于多种不同编程语言的插件,因此只需选择您更了解的插件,然后阅读该特定语言的Selenium文档即可。

I personally use python and it is quite easy to understand. 我个人使用python,这很容易理解。

Here is the selenium documentation for Python . 这是PythonSelenium文档

I finally use a Firefox Add-on called Greasemonkey to scrape the website. 最后,我使用了一个名为Greasemonkey的Firefox插件来抓取该网站。

https://addons.mozilla.org/en-US/firefox/addon/greasemonkey/ https://addons.mozilla.org/en-US/firefox/addon/greasemonkey/

Greasemonkey can capture dynamically generated data in http://www.quickbid.com.tw/ Greasemonkey可以在http://www.quickbid.com.tw/中捕获动态生成的数据

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM