xpath
has comment()
to get comment.
But it gives comment as normal text and you have to remove <!--
and -->
and parse it to search inside this HTML
. In scrapy
you can use class Selector()
to parse it.
Minimal working code
from scrapy.selector import Selector
sel = Selector(text='''
<div>
<!--
<div class="outer">
<div class="inner">Hello World</div>
</div>
-->
</div>''')
comment = sel.xpath('//comment()').get()
print(comment)
#html = comment.replace('<!--', '').replace('-->', '')
html = comment[4:-3]
print(html)
sel = Selector(text=html)
divs = sel.xpath('//div').getall()
print(divs)
Result:
<!--
<div class="outer">
<div class="inner">Hello World</div>
</div>
-->
<div class="outer">
<div class="inner">Hello World</div>
</div>
['<div class="outer">\n<div class="inner">Hello World</div>\n</div>', '<div class="inner">Hello World</div>']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.