[英]Parsing just the content in HTML nodes via Nokogiri in Ruby
Suppose I have parsed a line of HTML that is the following... 假设我已经解析了以下一行HTML ...
<a href="http://www.google.com" class="blah"><img src="logo.png" border="0"></img><br><span class="red">Go to google!</span></a>
This is just an example...but how would I go about stripping everything EXCEPT the following: 这只是一个示例...但是除了以下内容外,我将如何剥离所有内容:
http://www.google.com
logo.png
Go to google!
Also, is it possible to search for wildcards? 另外,是否可以搜索通配符?
If you could make use of some gems it will be a very simple job. 如果可以利用一些宝石,这将是一个非常简单的工作。 I would recommend you
Mechanize gem
. 我建议您
Mechanize gem
。 Reference: http://mechanize.rubyforge.org/Mechanize.html 参考: http : //mechanize.rubyforge.org/Mechanize.html
Maybe like this: 也许是这样的:
doc = Nokogiri::HTML '<a href="http://www.google.com" class="blah"><img src="logo.png" border="0"></img><br><span class="red">Go to google!</span></a>'
doc.xpath('//*/@href|//*/@src|//*/text()').map(&:to_s)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.