[英]What's the best way to scrape disqus comment count in scrapy?
I'm just getting started with scrapy and am interested in the best practices for this situation.我刚刚开始使用scrapy,并且对这种情况的最佳实践感兴趣。 Scrapy is designed to select elements on the page using either CSS or XPath.
Scrapy 旨在使用 CSS 或 XPath 选择页面上的元素。 Disqus comments appear to load in iFrame making them harder to scrape.
Disqus 评论似乎在 iFrame 中加载,使其更难抓取。 I know they have an API, but is there a way to scrape them using xpath/css or some other easy selector?
我知道他们有一个 API,但是有没有办法使用 xpath/css 或其他一些简单的选择器来抓取它们?
Here's an example post: http://www.ibtimes.com/who-aaron-ybarra-suspected-seattle-pacific-university-shooter-obsessed-columbine-1595326这是一个示例帖子: http ://www.ibtimes.com/who-aaron-ybarra-suspected-seattle-pacific-university-shooter-obsessed-columbine-1595326
I tried just using the xpath of Disqus comments count, but that didn't appear to work.我尝试只使用 Disqus 评论计数的 xpath,但这似乎不起作用。
In [36]: sel.xpath('//*[@id="main-nav"]/nav/ul/li[1]/a/span[1]').extract()
Out[36]: []
Is there some other way to get the count?有没有其他方法可以得到计数? What is the best strategy here?
这里最好的策略是什么?
Disqus is in an iframe object on third party websites. Disqus 位于第三方网站的 iframe 对象中。 By accessing the "src" in iframe, you can follow the link and then proceed as normal.
通过访问 iframe 中的“src”,您可以点击链接,然后正常进行。
You would need to use a headless browser.您将需要使用无头浏览器。 Try importing modules such as
scrapy-selenium
尝试导入诸如
scrapy-selenium
类的模块
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.