How can I convert “A,B,C” into an array?

Question

I want to convert A,B,C into an array like:

["A","B", "C"]

I tried .scan(/()(.*?)(<\\/p>)/i) . What's the most convenient/robust way in Ruby, with more fault tolerance?

Answer 1

Use Regex on Your String Fragment

If you know your HTML tags will always be lowercase, and your paragraph bodies will always be capitals, then this will work:

"<p>A</p>,<p>B</p>,<p>C</p>".scan /\p{Upper}/
#=> ["A", "B", "C"]

but it will be brittle. This certainly works for your posted corpus, though.

Use Nokogiri on Your HTML Fragment

Since you have an HTML fragment, you should really use a parser. For example:

require 'nokogiri'

doc = Nokogiri::HTML::DocumentFragment.parse "<p>A</p>,<p>B</p>,<p>C</p>"
doc.xpath(?p).map &:text
#=> ["A", "B", "C"]

Unless your input is truly pathological, Nokogiri will reliably extract the text nodes from your paragraph tags even if the input varies, and will ignore extraneous characters outside the nodes such as the commas in your string fragment.

Answer 2

Try something like this:

"<p>A</p>,<p>B</p>,<p>C</p>".gsub(/<p>|<\/p>/,'').split(',')

This will remove  and  and split the resulting string to an array

Answer 3

After editing your regex to remove the parentheses around the  tags, and then flattening the result, I got the output you are after.

"A,B,C".scan(/(.*?)<\\/p>/i).flatten

This yields ["A", "B", "C"] .

http://ideone.com/bfDtGc

How can I convert “<p>A</p>,<p>B</p>,<p>C</p>” into an array?

Question

3 answers

solution1
4 2014-10-23 04:36:45

Use Regex on Your String Fragment

Use Nokogiri on Your HTML Fragment

solution2
0 2014-10-23 02:45:33

solution3
0 2014-10-23 11:18:58

How can I convert “<p>A</p>,<p>B</p>,<p>C</p>” into an array?

Question

3 answers

solution1 4 2014-10-23 04:36:45

Use Regex on Your String Fragment

Use Nokogiri on Your HTML Fragment

solution2 0 2014-10-23 02:45:33

solution3 0 2014-10-23 11:18:58

solution1
4 2014-10-23 04:36:45

solution2
0 2014-10-23 02:45:33

solution3
0 2014-10-23 11:18:58