I've read a couple of articles and posts on stackoverflow surrounding this topic. I apologize if I am repeating someone else's post on stack. Is there a way to iterate through the HTML source code of a given URL and return the text of a header tag?
Example:
<h2 class='title'>
<a href="/blog/step-by-step-guide-to-building-your-first-ruby-gem">Step-by-Step Guide to Building Your First Ruby Gem</a>
</h2>
The code looks for the
tag and returns Step-by-Step Guide to Building Your First Ruby Gem. I know there's the Nokogiri gem that searches for nodes in a xpath: doc.xpath('//h3/a').each do |link| puts link.content end
Is there one where I could potentially do
doc.html('h1').each do |tag| puts link.content end
I hope it makes sense...any insight of direction to a resource will be much appreciated.
Nokogiri has both XPath and CSS accessors, so you can do
doc.css('h1 > a').each do |tag| puts link.content end
if you don't like XPath. (Or just 'h1'
- I am not 100% sure if you want the text of links in headers, or headers themselves).
You can use the this library to build xpaths, it is easier to use
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.