简体   繁体   中英

How could I use a regex expression in Ruby to ignore HTML tags?

I have a Blog ActiveRecord model in a Rails application. The body property is text. It includes HTML tags for images, headings etc. I want to create a method that takes the first n amount of text of the body to show on the index page as preview text.

The problem is my method also grabbed all of the HTML tags so my method returns a string that looks like this:

Here is a picture I am talking about. <img src="path/to/image.png" / > <h1> Nice </h1>

Is there regex solution to ignore all the tags?

Rails has a strip_tags method

strip_tags("Strip <i>these</i> tags!")
# => Strip these tags!

strip_tags("<b>Bold</b> no more!  <a href='more.html'>See more here</a>...")
# => Bold no more!  See more here...

strip_tags("<div id='top-bar'>Welcome to my website!</div>")
# => Welcome to my website!

You don't want to try using regex to strip the tags; HTML is too complex and you want to do better things with your time instead of maintain a regex. Here's a non-Rails solution using Nokogiri:

require 'nokogiri'

text = 'Here is a picture I am talking about. <img src="path/to/image.png" / > <h1> Nice </h1>'

doc = Nokogiri::HTML::DocumentFragment.parse(text)
doc.text # => "Here is a picture I am talking about.   Nice "

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM