Python Scrapy - 被注释掉的抓取数据的问题

Question

After hours troubleshooting, I finally was able to determine that the reason I couldn't scrape this data is because the most vital data is being commented out, and js must be loading it.经过几个小时的故障排除，我终于能够确定我无法抓取这些数据的原因是因为最重要的数据被注释掉了，并且 js 必须加载它。 A "print response" does actually see it, but scrapy will not pull that data. “打印响应”确实会看到它，但 scrapy 不会提取该数据。

Answer 1

xpath has comment() to get comment. xpath有comment()来获取评论。

But it gives comment as normal text and you have to remove  and parse it to search inside this HTML .但它以普通文本的形式提供注释，您必须删除并对其进行解析以在此HTML内进行搜索。 In scrapy you can use class Selector() to parse it.在scrapy ，您可以使用 class Selector()来解析它。

Minimal working code最少的工作代码

from scrapy.selector import Selector

sel = Selector(text='''
<div>
<!--
<div class="outer">
<div class="inner">Hello World</div>
</div>
-->
</div>''')

comment = sel.xpath('//comment()').get()
print(comment)

#html = comment.replace('<!--', '').replace('-->', '')
html = comment[4:-3]
print(html)

sel = Selector(text=html)

divs = sel.xpath('//div').getall()
print(divs)

Result:结果：

<!--
<div class="outer">
<div class="inner">Hello World</div>
</div>
-->

<div class="outer">
<div class="inner">Hello World</div>
</div>

['<div class="outer">\n<div class="inner">Hello World</div>\n</div>', '<div class="inner">Hello World</div>']

Python Scrapy - 被注释掉的抓取数据的问题

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-06-29 06:35:29

Python Scrapy - 被注释掉的抓取数据的问题

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-06-29 06:35:29

解决方案1
2 已采纳 2020-06-29 06:35:29