How could I use a regex expression in Ruby to ignore HTML tags?

Question

I have a Blog ActiveRecord model in a Rails application. The body property is text. It includes HTML tags for images, headings etc. I want to create a method that takes the first n amount of text of the body to show on the index page as preview text.

The problem is my method also grabbed all of the HTML tags so my method returns a string that looks like this:

Here is a picture I am talking about. <img src="path/to/image.png" / > <h1> Nice </h1>

Is there regex solution to ignore all the tags?

Answer 1

Rails has a strip_tags method

strip_tags("Strip <i>these</i> tags!")
# => Strip these tags!

strip_tags("<b>Bold</b> no more!  <a href='more.html'>See more here</a>...")
# => Bold no more!  See more here...

strip_tags("<div id='top-bar'>Welcome to my website!</div>")
# => Welcome to my website!

Answer 2

You don't want to try using regex to strip the tags; HTML is too complex and you want to do better things with your time instead of maintain a regex. Here's a non-Rails solution using Nokogiri:

require 'nokogiri'

text = 'Here is a picture I am talking about. <img src="path/to/image.png" / > <h1> Nice </h1>'

doc = Nokogiri::HTML::DocumentFragment.parse(text)
doc.text # => "Here is a picture I am talking about.   Nice "

How could I use a regex expression in Ruby to ignore HTML tags?

Question

2 answers

solution1
4 ACCPTED 2016-09-26 15:02:50

solution2
0 2016-09-26 17:50:03

How could I use a regex expression in Ruby to ignore HTML tags?

Question

2 answers

solution1 4 ACCPTED 2016-09-26 15:02:50

solution2 0 2016-09-26 17:50:03

solution1
4 ACCPTED 2016-09-26 15:02:50

solution2
0 2016-09-26 17:50:03