Scrapy选择器可能由于Java语言而无法返回所需的字符

Question

I'm trying to scrape data from this Chinese webpage http://bxt.harbin.gov.cn/hrb_bzbxt/disshow.php?id=551950 . 我正在尝试从此中文网页http://bxt.harbin.gov.cn/hrb_bzbxt/disshow.php?id=551950上抓取数据。

In Scrapy shell, I cannot get any text in any td elements. 在Scrapy shell中，我无法在任何td元素中获得任何文本。 For example, response.xpath("/html/body/center[2]/table/tbody/tr[2]/td[3]/text()").extract() returns an empty list. 例如， response.xpath("/html/body/center[2]/table/tbody/tr[2]/td[3]/text()").extract()返回一个空列表。 The same thing is returned for other similar commands too. 其他类似命令也返回相同的内容。 When I inspect the html more closely, I find this in the head element: "script language = "javascript". I'm not sure if this is the cause of the problem. Could anybody help me figure out? I searched Stackoverflow for related topics, but it's too complex for me to grasp. Thank you for your help! 当我更仔细地检查html时，我在head元素中找到了这个：“ script language =” javascript“。我不确定这是否是问题的起因。有人可以帮我解决吗？我在Stackoverflow上搜索了相关内容主题，但是这对我来说太复杂了，谢谢您的帮助！

Answer 1

the problem here is that you are using a full path to get to the information you want, this isn't necessary, so no need to follow html -> body -> center , etc. 这里的问题是，您正在使用完整路径来获取所需的信息，这不是必需的，因此无需遵循html > body > center等。

You could just go directly to the td information you need, with something like: 您可以直接输入所需的td信息，例如：

response.xpath('//td/text()')

which will return a list of selectors (every text inside a td tag) to iterate with the information you need. 它将返回选择器列表（ td标签中的每个文本）以迭代所需的信息。

Scrapy选择器可能由于Java语言而无法返回所需的字符

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-10-28 20:21:05

Scrapy选择器可能由于Java语言而无法返回所需的字符

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-10-28 20:21:05

解决方案1
1 已采纳 2015-10-28 20:21:05