使用Nokogiri和Ruby从html doc获取链接和href文本？

Question

I'm trying to use the nokogiri gem to extract all the urls on the page as well their link text and store the link text and url in a hash. 我正在尝试使用nokogiri gem来提取页面上的所有URL以及它们的链接文本，并将链接文本和URL存储在哈希中。

<html>
    <body>
        <a href=#foo>Foo</a>
        <a href=#bar>Bar </a>
    </body>
</html>

I would like to return 我想回来

{"Foo" => "#foo", "Bar" => "#bar"}

Answer 1

Here's a one-liner: 这是一个单行：

Hash[doc.xpath('//a[@href]').map {|link| [link.text.strip, link["href"]]}]

#=> {"Foo"=>"#foo", "Bar"=>"#bar"}

Split up a bit to be arguably more readable: 分开一点可以说是更具可读性：

h = {}
doc.xpath('//a[@href]').each do |link|
  h[link.text.strip] = link['href']
end
puts h

#=> {"Foo"=>"#foo", "Bar"=>"#bar"}

Answer 2

Another way: 其他方式：

h = doc.css('a[href]').each_with_object({}) { |n, h| h[n.text.strip] = n['href'] }
# yields {"Foo"=>"#foo", "Bar"=>"#bar"}

And if you're worried that you might have the same text linking to different things then you collect the href s in arrays: 如果你担心你可能有相同的文本链接到不同的东西，那么你收集数组中的href ：

h = doc.css('a[href]').each_with_object(Hash.new { |h,k| h[k] = [ ]}) { |n, h| h[n.text.strip] << n['href'] }
# yields {"Foo"=>["#foo"], "Bar"=>["#bar"]}

使用Nokogiri和Ruby从html doc获取链接和href文本？

问题描述

2 个解决方案

解决方案1
14 已采纳 2012-02-17 22:31:24

解决方案2
2 2012-02-17 22:35:12

使用Nokogiri和Ruby从html doc获取链接和href文本？

问题描述

2 个解决方案

解决方案1 14 已采纳 2012-02-17 22:31:24

解决方案2 2 2012-02-17 22:35:12

解决方案1
14 已采纳 2012-02-17 22:31:24

解决方案2
2 2012-02-17 22:35:12