简体   繁体   English

Scrapy选择器可能由于Java语言而无法返回所需的字符

[英]Scrapy selector cannot return desired characters possibly due to Javascript

I'm trying to scrape data from this Chinese webpage http://bxt.harbin.gov.cn/hrb_bzbxt/disshow.php?id=551950 . 我正在尝试从此中文网页http://bxt.harbin.gov.cn/hrb_bzbxt/disshow.php?id=551950上抓取数据。

In Scrapy shell, I cannot get any text in any td elements. 在Scrapy shell中,我无法在任何td元素中获得任何文本。 For example, response.xpath("/html/body/center[2]/table/tbody/tr[2]/td[3]/text()").extract() returns an empty list. 例如, response.xpath("/html/body/center[2]/table/tbody/tr[2]/td[3]/text()").extract()返回一个空列表。 The same thing is returned for other similar commands too. 其他类似命令也返回相同的内容。 When I inspect the html more closely, I find this in the head element: "script language = "javascript". I'm not sure if this is the cause of the problem. Could anybody help me figure out? I searched Stackoverflow for related topics, but it's too complex for me to grasp. Thank you for your help! 当我更仔细地检查html时,我在head元素中找到了这个:“ script language =” javascript“。我不确定这是否是问题的起因。有人可以帮我解决吗?我在Stackoverflow上搜索了相关内容主题,但是这对我来说太复杂了,谢谢您的帮助!

the problem here is that you are using a full path to get to the information you want, this isn't necessary, so no need to follow html -> body -> center , etc. 这里的问题是,您正在使用完整路径来获取所需的信息,这不是必需的,因此无需遵循html > body > center等。

You could just go directly to the td information you need, with something like: 您可以直接输入所需的td信息,例如:

response.xpath('//td/text()')

which will return a list of selectors (every text inside a td tag) to iterate with the information you need. 它将返回选择器列表( td标签中的每个文本)以迭代所需的信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 JavaScript无法运作,可能是由于冲突 - Javascript not working, possibly due to conflict Ajax问题(可能归因于encodeURIComponent字符) - Ajax Issue (possibly due to encodeURIComponent characters) JavaScript警报可能由于数组语法不起作用 - Javascript alert not working possibly due to array syntax Vagrant和javascript错误可能是由于行尾 - Errors with Vagrant and javascript possibly due to line endings javascript函数可能由于范围问题而无法调用 - javascript function not calling possibly due to scope issue 由于元素是由 javascript 创建的,因此无法使用 eventListener 更改所需元素数量的 css 属性 - cannot change css property of desired amount of elements using eventListener due to elements being created by javascript 图像边距不变。 可能是由于我的JavaScript? - Image margin not changing. Possibly due to my javascript? JavaScript验证未激活,可能是由于HTML表单上的php mailer - javascript validation not activating possibly due to php mailer on HTML form 动态 CSS Javascript Function 失败(可能由于连字符) - Dynamic CSS Javascript Function Failing (Possibly Due to Hyphens) $(“ selector”)。find(“:psuedo-class”)不返回所需的元素 - $(“selector”).find(“:psuedo-class”) does not return desired elements
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM