简体   繁体   English

正则表达式以匹配重复的子字符串

[英]Regexp to match repeated substring

I would like to verify a string containing repeated substrings. 我想验证包含重复子字符串的字符串。 The substrings have a particular structure. 子字符串具有特定的结构。 Whole string has a particular structure (substring split by "|" ). 整个字符串具有特定的结构(子字符串用"|"分隔)。 For instance, the string can be: 例如,字符串可以是:

1=23.00|6=22.12|12=21.34|112=20.34
1=23.00|6=22.12|12=21.34
1=23.00|12=21.34
1=23.00**

How can I check that all repeated substrings match a regexp? 如何检查所有重复的子字符串是否与正则表达式匹配? I tried to check it with: 我试图用以下方法检查它:

"1=23.00|6=22.12|12=21.34".match(/([1-9][0-9]*[=][0-9\.]+)+/)

But checking gives true even when several substrings do not match the regexp: 但是,即使几个子字符串与regexp不匹配,检查也为true

"1=23.00|6=ass|=21.34".match(/([1-9][0-9]*[=][0-9\.]+)+/)
# => #<MatchData "1=23.00" 1:"1=23.00">

This will return true if there are any duplicates, false if there are not: 如果有重复项,则返回true否则,则返回false

s = "1=23.00|6=22.12|12=21.34|112=20.34|3=23.00"
arr = s.split(/\|/).map { |s| s.gsub(/\d=/, "") }

arr != arr.uniq # => true

The question is whether every repeated substring matches a regex. 问题是每个重复的子字符串是否都匹配一个正则表达式。 I understand that the substrings are separated by the character | 我知道子字符串由字符|分隔| or $/ , the latter being the end of a line. $/ ,后者是一行的结尾。 We first need to obtain the repeated substrings: 我们首先需要获得重复的子字符串:

a = str.split(/[#{$/}\|]/)
       .map(&:strip)
       .group_by {|s| s}
       .select {|_,v| v.size > 1 }
       .keys

Next we specify whatever regex you wish to use. 接下来,我们指定您要使用的任何正则表达式。 I am assuming it is this: 我假设是这样的:

REGEX = /[1-9][0-9]*=[1-9]+\.[0-9]+/

but it could be altered if you have other requirements. 但是如果您有其他要求,可以更改它。

As we wish to determine if all repeated substrings match the regex, that is simply: 正如我们希望确定是否所有重复的子字符串都与正则表达式匹配,这很简单:

a.all? {|s| s =~ REGEX}

Here are the calculations: 计算如下:

str =<<_
1=23.00|6=22.12|12=21.34|112=20.34
1=23.00|6=22.12|12=21.34
1=23.00|12=21.34
1=23.00**
_
c = str.split(/[#{$/}\|]/)
  #=> ["1=23.00", "6=22.12", "12=21.34", "112=20.34", "1=23.00",
  #    "6=22.12", "12=21.34", "1=23.00", "12=21.34", "1=23.00**"] 
d = c.map(&:strip)
  # same as c, possibly not needed or not wanted
e = d.group_by {|s| s}
  # => {"1=23.00"  =>["1=23.00", "1=23.00", "1=23.00"],
  #     "6=22.12"  =>["6=22.12", "6=22.12"],
  #     "12=21.34" =>["12=21.34", "12=21.34", "12=21.34"],
  #     "112=20.34"=>["112=20.34"], "1=23.00**"=>["1=23.00**"]} 
f = e.select {|_,v| v.size > 1 }
  #=> {"1=23.00"=>["1=23.00",  "1=23.00" ,  "1=23.00"],
  #    "6=22.12"=>["6=22.12",  "6=22.12"],
  #   "12=21.34"=>["12=21.34", "12=21.34", "12=21.34"]} 
a = f.keys
  #=> ["1=23.00", "6=22.12", "12=21.34"] 
a.all? {|s| s =~ REGEX}
  #=> true

If you want to resolve it through regexp (not ruby), you should match whole string, not substrings. 如果要通过正则表达式(不是ruby)解析它,则应匹配整个字符串,而不是子字符串。 Well, I added [|] symbol and line ending to your regexp and it should works like you want. 好吧,我在您的正则表达式中添加了[|]符号和行结尾,它应该可以像您想要的那样工作。

([1-9][0-9]*[=][0-9\.]+[|]*)+$

Try it out. 试试看。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM