Python Scrapy - Issues with scraping data that is commented out

Question

After hours troubleshooting, I finally was able to determine that the reason I couldn't scrape this data is because the most vital data is being commented out, and js must be loading it. A "print response" does actually see it, but scrapy will not pull that data.

Answer 1

xpath has comment() to get comment.

But it gives comment as normal text and you have to remove  and parse it to search inside this HTML . In scrapy you can use class Selector() to parse it.

Minimal working code

from scrapy.selector import Selector

sel = Selector(text='''
<div>
<!--
<div class="outer">
<div class="inner">Hello World</div>
</div>
-->
</div>''')

comment = sel.xpath('//comment()').get()
print(comment)

#html = comment.replace('<!--', '').replace('-->', '')
html = comment[4:-3]
print(html)

sel = Selector(text=html)

divs = sel.xpath('//div').getall()
print(divs)

Result:

<!--
<div class="outer">
<div class="inner">Hello World</div>
</div>
-->

<div class="outer">
<div class="inner">Hello World</div>
</div>

['<div class="outer">\n<div class="inner">Hello World</div>\n</div>', '<div class="inner">Hello World</div>']

Python Scrapy - Issues with scraping data that is commented out

Question

1 answers

solution1
2 ACCPTED 2020-06-29 06:35:29

Python Scrapy - Issues with scraping data that is commented out

Question

1 answers

solution1 2 ACCPTED 2020-06-29 06:35:29

solution1
2 ACCPTED 2020-06-29 06:35:29