简体   繁体   English

如何使用BeautifulSoup抓取用javascript生成的数据?

[英]How do I scrape data generated with javascript using BeautifulSoup?

I'm trying to migrate some comments from a blog using web scraping with python and BeautifulSoup. 我正在尝试使用python和BeautifulSoup的网络抓取功能从博客中迁移一些评论。 The content I'm looking for isn't in the HTML itself and seems to have been generated in a script tag (which I can't find). 我要查找的内容不在HTML本身中,并且似乎是在script标签(我找不到)中生成的。 I've seen some answers regarding this but most of them are specific to a certain problem and I can't seem to figure out how to apply it to my site. 我已经看到了一些有关此问题的答案,但是大多数答案都是特定于某个问题的,我似乎无法弄清楚如何将其应用于我的网站。 I'm just trying to scrape comments from pages like this one: 我只是想从这样的页面中抓取评论:

http://www.themasterpiececards.com/famous-paintings-reviewed/bid/92327/famous-paintings-duccio-s-maesta http://www.themasterpiececards.com/famous-paintings-reviewed/bid/92327/famous-paintings-duccio-s-maesta

I've also tried Selenium, but I'm using a Cloud9-based IDE currently and it doesn't seem to support web drivers. 我也尝试过Selenium,但是我目前正在使用基于Cloud9的IDE,它似乎不支持Web驱动程序。

I apologize if I botched any of the lingo, I'm pretty new to programming. 如果我搞砸了任何术语,我深表歉意,我是编程新手。 If anyone has any tips, that would be helpful. 如果有人有任何提示,那将有所帮助。 Thanks! 谢谢!

You have many ways to scrap such content. 您有很多方法可以删除此类内容。 One would be to find out how comments are loaded on this website. 一种是找出如何在此网站上加载评论。 On quick lookup in chromium developer tools, comments for the page mentioned are loaded via this api call. 在Chrome开发人员工具中快速查找时,通过 api调用会加载针对该页面的注释。

This may not be a suitable way for you as you may not generate this url for every different page. 这可能不适合您,因为您可能不会为每个不同的页面生成此URL。

Another more reliable way would be to render such js content using GUIless browser, for ease of implementation i would suggest using scrapy with splash .Splash is a python framework which renders most of the content for your requests. 另一种更可靠的方法是使用无GUI浏览器呈现此类js内容,为便于实现,我建议使用scrapy with splash .Splash是一个python框架,可为您的请求呈现大部分内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用python抓取JavaScript生成的数据 - How to scrape data generated by javascript using python 如何抓取似乎使用 javascript 生成且无法直接打开的 aspx 网站 - How do I scrape an aspx website that seems to be generated using javascript and cannot be directly opened 如何使用由 Javascript 函数生成的 Ruby 抓取数据? - How to scrape data using Ruby which is generated by a Javascript function? 我如何从网页中抓取 HTML 代码,因为它使用的是 beautifulsoup 而不丢失文本格式? - How do I scrape the HTML code from a webpage as it is using beautifulsoup without losing text format? 如何从 JavaScript 网站抓取数据? - How do I scrape data from JavaScript website? 我如何抓取数据<canvas>元素与 python 或 javascript? - How do I scrape data in <canvas> element with python or javascript? 当有生成的标签时,我如何 web 刮? - How do I web scrape when there are generated tags? 如何在Python中抓取JS生成的登录令牌? - How do I scrape the login token generated by JS in Python? 如何使用javascript / jQuery指示javascript生成的ID? - how do I indicate a javascript generated ID using javascript/jQuery? 我如何在这个网页上抓取JS生成的数据? - How would I scrape the JS-generated data on this webpage?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM