简体   繁体   English

Nokogiri节点集

[英]Nokogiri Node Set

I am trying to use Nokogiri to scrape a web page. 我正在尝试使用Nokogiri抓取网页。 Right now, I am able to set a variable links to the following on a web page: 现在,我可以在网页上设置指向以下内容的变量链接:

links = page.css('.item_inner')

and links is a: 和链接是:

Nokogiri::XML::NodeSet

Then I iterate through this NodeSet(links): 然后,我遍历此NodeSet(链接):

links.each{|link| puts link.css('.details a')}

In order to get some more information. 为了获得更多信息。 But now the method above's class is now a: 但是现在上面的类的方法现在是:

Fixnum

and returns a list of (I'm not sure exactly what they are returning but it looks like a list of these: 并返回一个列表(我不确定它们到底要返回什么,但看起来像是这些列表:

<a se:clickable:target="true" href="/nyc/sale/1056207-coop-150-sullivan-street-soho-new-york?featured=1">150 Sullivan Street #34</a>

Now I know that there are key/value pairs within this but I am unable to access them at this point. 现在,我知道其中包含键/值对,但是目前无法访问它们。 How can I access say the href here and the actual name? 我该如何访问这里的href和实际名称?

Once you have a single link as a node, its href is link['href'] and so forth, and the link text ("150 Sullivan Street") is its content . 单个链接作为节点后,其href就是link['href'] ,依此类推,而链接文本(“ 150 Sullivan Street”)就是其content

NOTE: A css search always yields what is effectively an array of found nodes (actually a NodeSet). 注意: css搜索总是产生有效的发现节点数组(实际上是NodeSet)。 If you are quite sure that there is only one of something to be found by your search, you can skip past that by using at_css instead, thus yielding a single node. 如果您确定只能通过搜索找到某项内容,则可以使用at_css跳过该at_css ,从而产生一个节点。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM