简体   繁体   中英

What's the best way to scrape disqus comment count in scrapy?

I'm just getting started with scrapy and am interested in the best practices for this situation. Scrapy is designed to select elements on the page using either CSS or XPath. Disqus comments appear to load in iFrame making them harder to scrape. I know they have an API, but is there a way to scrape them using xpath/css or some other easy selector?

Here's an example post: http://www.ibtimes.com/who-aaron-ybarra-suspected-seattle-pacific-university-shooter-obsessed-columbine-1595326

I tried just using the xpath of Disqus comments count, but that didn't appear to work.

In [36]: sel.xpath('//*[@id="main-nav"]/nav/ul/li[1]/a/span[1]').extract()
Out[36]: []

Is there some other way to get the count? What is the best strategy here?

Disqus is in an iframe object on third party websites. By accessing the "src" in iframe, you can follow the link and then proceed as normal.

You would need to use a headless browser. Try importing modules such as scrapy-selenium

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM