Nokogiri文本節點內容

Question

是否有任何干凈的方式來獲取Nokogiri的文本節點的內容？ 現在我正在使用

some_node.at_xpath( "//whatever" ).first.content

這對於獲取文本來說似乎非常冗長。

Answer 1

你只想要文字嗎？

doc.search('//text()').map(&:text)

也許你不想要所有的空白和噪音。 如果只想要包含單詞字符的文本節點，

doc.search('//text()').map(&:text).delete_if{|x| x !~ /\w/}

編輯：您似乎只想要單個節點的文本內容：

some_node.at_xpath( "//whatever" ).text

Answer 2

只需查找文本節點：

require 'nokogiri'

doc = Nokogiri::HTML(<<EOT)
<html>
<body>
<p>This is a text node </p>
<p> This is another text node</p>
</body>
</html>
EOT

doc.search('//text()').each do |t|
  t.replace(t.content.strip)
end

puts doc.to_html

哪個輸出：

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<p>This is a text node</p>
<p>This is another text node</p>
</body></html>

順便說一下，你的代碼示例不起作用。 at_xpath( "//whatever" ).first是多余的，會失敗。 at_xpath只會找到第一個匹配項，返回一個Node。 first是多余的，如果它可以工作，但它不會因為Node沒有first一種方法。

我有<data><foo>bar</foo></bar> ，如何在不執行doc.xpath_at( "//data/foo" ).children.first.content情況下獲取“bar”文本？

假設doc包含解析的DOM：

doc.to_xml # => "<?xml version=\"1.0\"?>\n<data>\n  <foo>bar</foo>\n</data>\n"

第一次出現：

doc.at('foo').text       # => "bar"
doc.at('//foo').text     # => "bar"
doc.at('/data/foo').text # => "bar"

獲取所有出現並采取第一個：

doc.search('foo').first.text      # => "bar"
doc.search('//foo').first.text    # => "bar"
doc.search('data foo').first.text # => "bar"

Nokogiri文本節點內容

問題描述

2 個解決方案

解決方案1
13 已采納 2012-08-16 20:09:57

解決方案2
8 2012-08-16 19:10:55

Nokogiri文本節點內容

問題描述

2 個解決方案

解決方案1 13 已采納 2012-08-16 20:09:57

解決方案2 8 2012-08-16 19:10:55

解決方案1
13 已采納 2012-08-16 20:09:57

解決方案2
8 2012-08-16 19:10:55