Grabbing the text from HTML source code of URL using Ruby

Question

I've read a couple of articles and posts on stackoverflow surrounding this topic. I apologize if I am repeating someone else's post on stack. Is there a way to iterate through the HTML source code of a given URL and return the text of a header tag?

Example:

<h2 class='title'>
<a href="/blog/step-by-step-guide-to-building-your-first-ruby-gem">Step-by-Step Guide to Building Your First Ruby Gem</a>
</h2>

The code looks for the

tag and returns Step-by-Step Guide to Building Your First Ruby Gem. I know there's the Nokogiri gem that searches for nodes in a xpath:

 doc.xpath('//h3/a').each do |link| puts link.content end

Is there one where I could potentially do

doc.html('h1').each do |tag| puts link.content end

I hope it makes sense...any insight of direction to a resource will be much appreciated.

Answer 1

Nokogiri has both XPath and CSS accessors, so you can do

doc.css('h1 > a').each do |tag| puts link.content end

if you don't like XPath. (Or just 'h1' - I am not 100% sure if you want the text of links in headers, or headers themselves).

Answer 2

You can use the this library to build xpaths, it is easier to use

Grabbing the text from HTML source code of URL using Ruby

Question

1 answers

solution1
1 2014-06-05 02:05:14

solution2
0 2022-09-16 17:55:11

Grabbing the text from HTML source code of URL using Ruby

Question

1 answers

solution1 1 2014-06-05 02:05:14

solution2 0 2022-09-16 17:55:11

solution1
1 2014-06-05 02:05:14

solution2
0 2022-09-16 17:55:11