简体   繁体   English

通过Ruby中的Nokogiri仅解析HTML节点中的内容

[英]Parsing just the content in HTML nodes via Nokogiri in Ruby

Suppose I have parsed a line of HTML that is the following... 假设我已经解析了以下一行HTML ...

<a href="http://www.google.com" class="blah"><img src="logo.png" border="0"></img><br><span class="red">Go to google!</span></a>

This is just an example...but how would I go about stripping everything EXCEPT the following: 这只是一个示例...但是除了以下内容外,我将如何剥离所有内容:

http://www.google.com
logo.png
Go to google!

Also, is it possible to search for wildcards? 另外,是否可以搜索通配符?

If you could make use of some gems it will be a very simple job. 如果可以利用一些宝石,这将是一个非常简单的工作。 I would recommend you Mechanize gem . 我建议您Mechanize gem Reference: http://mechanize.rubyforge.org/Mechanize.html 参考: http : //mechanize.rubyforge.org/Mechanize.html

Maybe like this: 也许是这样的:

doc = Nokogiri::HTML '<a href="http://www.google.com" class="blah"><img src="logo.png" border="0"></img><br><span class="red">Go to google!</span></a>'
doc.xpath('//*/@href|//*/@src|//*/text()').map(&:to_s)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM