简体   繁体   中英

How can I convert “<p>A</p>,<p>B</p>,<p>C</p>” into an array?

I want to convert <p>A</p>,<p>B</p>,<p>C</p> into an array like:

["A","B", "C"]

I tried .scan(/(<p>)(.*?)(<\\/p>)/i) . What's the most convenient/robust way in Ruby, with more fault tolerance?

Use Regex on Your String Fragment

If you know your HTML tags will always be lowercase, and your paragraph bodies will always be capitals, then this will work:

"<p>A</p>,<p>B</p>,<p>C</p>".scan /\p{Upper}/
#=> ["A", "B", "C"]

but it will be brittle. This certainly works for your posted corpus, though.

Use Nokogiri on Your HTML Fragment

Since you have an HTML fragment, you should really use a parser. For example:

require 'nokogiri'

doc = Nokogiri::HTML::DocumentFragment.parse "<p>A</p>,<p>B</p>,<p>C</p>"
doc.xpath(?p).map &:text
#=> ["A", "B", "C"]

Unless your input is truly pathological, Nokogiri will reliably extract the text nodes from your paragraph tags even if the input varies, and will ignore extraneous characters outside the nodes such as the commas in your string fragment.

Try something like this:

"<p>A</p>,<p>B</p>,<p>C</p>".gsub(/<p>|<\/p>/,'').split(',')

This will remove <p> and </p> and split the resulting string to an array

After editing your regex to remove the parentheses around the <p> tags, and then flattening the result, I got the output you are after.

"<p>A</p>,<p>B</p>,<p>C</p>".scan(/<p>(.*?)<\\/p>/i).flatten

This yields ["A", "B", "C"] .

http://ideone.com/bfDtGc

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM