[英]Scraping with Nokogiri::HTML - Can't get text from XPATH
I'm trying to scrape html with Nokogiri. 我正在尝试用Nokogiri抓取html。 This is the html source:
这是html来源:
<span id="J_WlAreaInfo" class="wl-areacon">
<span id="J-From">山东济南</span>
至
<span id="J-To">
<span id="J_WlAddressInfo" class="wl-addressinfo" title="全国">
全国
<s></s>
</span>
</span>
</span>
I need to get the following text: 山东济南 我需要输入以下文本:山东济南
Checked shortest XPATH with firebug: 使用Firebug检查了最短的XPATH:
//*[@id="J-From"]
Here is my ruby code: 这是我的红宝石代码:
doc = Nokogiri::HTML(open("http://foo.html"), "UTF-8")
area = doc.xpath('//*[@id="J-From"]')
puts area.text
However, it returns nothing. 但是,它什么也不返回。 What am I doing wrong?
我究竟做错了什么?
However, it returns nothing.
但是,它什么也不返回。 What am I doing wrong?
我究竟做错了什么?
xpath() returns an array containing the matches (it's actually called a NodeSet): xpath()返回一个包含匹配项的数组(实际上称为NodeSet):
require 'nokogiri'
html = %q{
<span id="J_WlAreaInfo" class="wl-areacon">
<span id="J-From">山东济南</span>
至
<span id="J-To">
<span id="J_WlAddressInfo" class="wl-addressinfo" title="全国">
全国
<s></s>
</span>
</span>
</span>
}
doc = Nokogiri::HTML(html)
target_tags = doc.xpath('//*[@id="J-From"]')
target_tags.each do |target_tag|
puts target_tag.text
end
--output:--
山东济南
Edit: You can actually call text()
on the Array, but it will return the concatenated results of the text for each match in the array--which is not something I've ever found useful--but because there is only one match you should have gotten the result 山东济南
. 编辑:您实际上可以在Array上调用
text()
,但是它将为数组中的每个匹配返回文本的串联结果-这不是我发现的有用的东西-但因为只有一个匹配您应该已经得到了山东济南
的结果。 There is nothing in your post that indicates why you didn't get that result. 您的帖子中没有任何内容表明您为什么没有得到该结果。
If you only want a single result from your xpath, ie the first match, then you can use at_xpath()
: 如果只希望从xpath获得单个结果,即第一个匹配项,则可以使用
at_xpath()
:
target_tag = doc.at_xpath('//*[@id="J-From"]')
puts target_tag.text
--output:--
山东济南
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.