简体   繁体   中英

Nokogiri text node contents

Is there any clean way to get the contents of text nodes with Nokogiri? Right now I'm using

some_node.at_xpath( "//whatever" ).first.content

which seems really verbose for just getting text.

You want only the text?


Maybe you don't want all the whitespace and noise. If you want only the text nodes containing a word character,

doc.search('//text()').map(&:text).delete_if{|x| x !~ /\w/}

Edit: It appears you only wanted the text content of a single node:

some_node.at_xpath( "//whatever" ).text

Just look for text nodes:

require 'nokogiri'

doc = Nokogiri::HTML(<<EOT)
<p>This is a text node </p>
<p> This is another text node</p>

doc.search('//text()').each do |t|

puts doc.to_html

Which outputs:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<p>This is a text node</p>
<p>This is another text node</p>

BTW, your code example doesn't work. at_xpath( "//whatever" ).first is redundant and will fail. at_xpath will find only the first occurrence, returning a Node. first is superfluous at that point, if it would work, but it won't because Node doesn't have a first method.

I have <data><foo>bar</foo></bar> , how I get at the "bar" text without doing doc.xpath_at( "//data/foo" ).children.first.content ?

Assuming doc contains the parsed DOM:

doc.to_xml # => "<?xml version=\"1.0\"?>\n<data>\n  <foo>bar</foo>\n</data>\n"

Get the first occurrence:

doc.at('foo').text       # => "bar"
doc.at('//foo').text     # => "bar"
doc.at('/data/foo').text # => "bar"

Get all occurrences and take the first one:

doc.search('foo').first.text      # => "bar"
doc.search('//foo').first.text    # => "bar"
doc.search('data foo').first.text # => "bar"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM