简体   繁体   中英

Grabbing the text from HTML source code of URL using Ruby

I've read a couple of articles and posts on stackoverflow surrounding this topic. I apologize if I am repeating someone else's post on stack. Is there a way to iterate through the HTML source code of a given URL and return the text of a header tag?

Example:

<h2 class='title'>
<a href="/blog/step-by-step-guide-to-building-your-first-ruby-gem">Step-by-Step Guide to Building Your First Ruby Gem</a>
</h2>

The code looks for the

tag and returns Step-by-Step Guide to Building Your First Ruby Gem. I know there's the Nokogiri gem that searches for nodes in a xpath:

 doc.xpath('//h3/a').each do |link| puts link.content end

Is there one where I could potentially do

doc.html('h1').each do |tag| puts link.content end

I hope it makes sense...any insight of direction to a resource will be much appreciated.

Nokogiri has both XPath and CSS accessors, so you can do

doc.css('h1 > a').each do |tag| puts link.content end

if you don't like XPath. (Or just 'h1' - I am not 100% sure if you want the text of links in headers, or headers themselves).

You can use the this library to build xpaths, it is easier to use

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM