简体   繁体   English

如何转换“ <p> 一种 </p> , <p> 乙 </p> , <p> C </p> ”变成数组?

[英]How can I convert “<p>A</p>,<p>B</p>,<p>C</p>” into an array?

I want to convert <p>A</p>,<p>B</p>,<p>C</p> into an array like: 我想将<p>A</p>,<p>B</p>,<p>C</p>转换为一个数组,例如:

["A","B", "C"]

I tried .scan(/(<p>)(.*?)(<\\/p>)/i) . 我尝试了.scan(/(<p>)(.*?)(<\\/p>)/i) What's the most convenient/robust way in Ruby, with more fault tolerance? 什么是Ruby中最方便/最可靠的方法,并且具有更高的容错能力?

Use Regex on Your String Fragment 在您的字符串片段上使用正则表达式

If you know your HTML tags will always be lowercase, and your paragraph bodies will always be capitals, then this will work: 如果您知道HTML标记将始终为小写字母,并且段落主体将始终为大写字母,那么它将起作用:

"<p>A</p>,<p>B</p>,<p>C</p>".scan /\p{Upper}/
#=> ["A", "B", "C"]

but it will be brittle. 但是会很脆。 This certainly works for your posted corpus, though. 不过,这当然适用于您发布的语料库。

Use Nokogiri on Your HTML Fragment 在HTML片段上使用Nokogiri

Since you have an HTML fragment, you should really use a parser. 由于具有HTML片段,因此您实际上应该使用解析器。 For example: 例如:

require 'nokogiri'

doc = Nokogiri::HTML::DocumentFragment.parse "<p>A</p>,<p>B</p>,<p>C</p>"
doc.xpath(?p).map &:text
#=> ["A", "B", "C"]

Unless your input is truly pathological, Nokogiri will reliably extract the text nodes from your paragraph tags even if the input varies, and will ignore extraneous characters outside the nodes such as the commas in your string fragment. 除非您输入的内容确实是病态的,否则即使输入内容发生变化,Nokogiri也会可靠地从段落标签中提取文本节点,并且会忽略节点外的多余字符,例如字符串片段中的逗号。

Try something like this: 尝试这样的事情:

"<p>A</p>,<p>B</p>,<p>C</p>".gsub(/<p>|<\/p>/,'').split(',')

This will remove <p> and </p> and split the resulting string to an array 这将删除<p></p>并将结果字符串拆分为一个数组

After editing your regex to remove the parentheses around the <p> tags, and then flattening the result, I got the output you are after. 编辑您的正则表达式以删除<p>标记周围的括号,然后将结果展平后,我得到了您想要的输出。

"<p>A</p>,<p>B</p>,<p>C</p>".scan(/<p>(.*?)<\\/p>/i).flatten

This yields ["A", "B", "C"] . 这产生["A", "B", "C"]

http://ideone.com/bfDtGc http://ideone.com/bfDtGc

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM