简体   繁体   English

如何将字符串与复杂的定界符匹配(ruby中的正则表达式)

[英]How to match string with complex delimiters (regex in ruby)

I would like to match attribute pairs from string similiar to the one below 我想从类似的字符串匹配属性对以下

<tag_name attra="#{t("a.b.c")}" attrb="aa a">

... sould match on ...充满灵魂的比赛

attra="#{t("abc")}" and attrb="aa a" attra =“#{t(” abc“)}”和attrb =“ aa a”

thanks in advance Marius 在此先感谢Marius

You could use lookaheads to detect if the quotes that are ending are part of the value or not, by looking if they are followed by a space or '>' 通过查看引号后面是否有空格或'>',您可以使用前瞻性来检测引号是否是值的一部分

ruby-1.8.7-p248 > s='<tag_name attra="#{t("a.b.c")}" attrb="aa a">'
=> "<tag_name attra=\"\#{t(\"a.b.c\")}\" attrb=\"aa a\">" 
ruby-1.8.7-p248 > s.scan /\w+=".*?"(?=\s|>)/
=> ["attra=\"\#{t(\"a.b.c\")}\"", "attrb=\"aa a\""] 

Of course that won't work if you have a quote followed by a space or a '>' in your attribute value, so no matter how you look at it its a losing battle unless you skip those quotes inside the attribute values or preprocess them somehow. 当然,如果属性值中带有引号后跟空格或“>”,那是行不通的,因此,无论您如何看待它,都是一场失败的战斗,除非您跳过属性值内的那些引号或对其进行预处理不知何故。 That's the reason why every language's string and regex have delimiters be skipped or preprocessed if they're found inside of the delimited value. 这就是为什么如果每种语言的字符串和正则表达式都在定界值中找到,则会跳过或对其进行预处理的原因。

If there were no quotation marks in the attribute values (like attrb="aa a" ) or if the quotation marks were escaped as entities (like attrib="&quot;Hello,&quot; he said" ) then it would be really easy to do with a regex along the lines of 如果属性值中没有引号(例如attrb="aa a" ),或者如果引号是作为实体转义的(例如attrib="&quot;Hello,&quot; he said" ),那么将很容易按照以下方式使用正则表达式

/\w+="[^"]*"/

However, since you're really trying to match attra="#{t("abc")}" which is part of some Ruby code that generates XML (and which is not itself valid XML), even an XML parser (such as REXML or Nokogiri) won't solve this problem for you. 但是,由于您实际上是在尝试匹配attra="#{t("abc")}" ,这是一些生成XML(并且本身不是有效的XML)的Ruby代码的一部分,甚至是XML解析器(例如REXML或Nokogiri)无法为您解决此问题。 You'll need your own context free parser, or you'll need to user the ripper library that's part of the Ruby 1.9.1 standard library to parse the parts of the attribute that are interpolated Ruby code, and then use some clever hack (like replacing the interpolated ruby code with a special character string) to match around the attribute value. 你需要自己的上下文无关的解析器,或者你需要用户松土库,该库的1.9.1标准库来解析插值Ruby代码属性的部件的一部分,然后用一些巧妙的黑客(例如将插入的红宝石代码替换为特殊字符串)以匹配属性值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM