简体   繁体   English

如何使用Regexp.union匹配字符串开头的字符

[英]How to use Regexp.union to match a character at the beginning of my string

I'm using Ruby 2.4. 我正在使用Ruby 2.4。 I want to match an optional "a" or "b" character, followed by an arbitrary amount of white space, and then one or more numbers, but my regex's are failing to match any of these: 我想匹配一个可选的“ a”或“ b”字符​​,然后匹配任意数量的空格,然后匹配一个或多个数字,但是我的正则表达式无法匹配以下任何一个:

2.4.0 :017 > MY_TOKENS = ["a", "b"]
 => ["a", "b"]
2.4.0 :018 > str = "40"
 => "40"
2.4.0 :019 > str =~ Regexp.new("^[#{Regexp.union(MY_TOKENS)}]?[[:space:]]*\d+[^a-z^0-9]*$")
 => nil
2.4.0 :020 > str =~ Regexp.new("^#{Regexp.union(MY_TOKENS)}?[[:space:]]*\d+[^a-z^0-9]*$")
 => nil
2.4.0 :021 > str =~ Regexp.new("^#{Regexp.union(MY_TOKENS)}?[[:space:]]*\d+$")
 => nil

I'm stumped as to what I'm doing wrong. 我为自己做错了事而感到困惑。

If they are single characters, just use MY_TOKENS.join inside the character class: 如果它们是单个字符, MY_TOKENS.join在字符类中使用MY_TOKENS.join

MY_TOKENS = ["a", "b"]
str = "40"
first_regex = /^[#{MY_TOKENS.join}]?[[:space:]]*\d+[^a-z0-9]*$/
# /^[ab]?[[:space:]]*\d+[^a-z0-9]*$/ 
puts str =~ first_regex
# 0

You can also integrate the Regexp.union, it might lead to some unexpected bugs though, because the flags of the outer regexp won't apply to the inner one : 您还可以集成Regexp.union,尽管这样可能会导致一些意外的错误 ,因为外部regexp的标志不适用于内部的:

second_regex = /^#{Regexp.union(MY_TOKENS)}?[[:space:]]*\d+[^a-z0-9]*$/
# /^(?-mix:a|b)?[[:space:]]*\d+[^a-z0-9]*$/
puts str =~ second_regex
# 0

The above regex looks a lot like what you did, but using // instead of Regexp.new prevents you from having to escape the backslashes. 上面的正则表达式看起来很像您所做的事情,但是使用//代替Regexp.new可以避免转义反斜杠。

You could use Regexp#source to avoid this behaviour : 您可以使用Regexp#source避免此行为:

third_regex = /^(?:#{Regexp.union(MY_TOKENS).source})?[[:space:]]*\d+[^a-z0-9]*$/
# /^(?:a|b)?[[:space:]]*\d+[^a-z0-9]*$/
puts str =~ third_regex
# 0

or simply build your regex union : 或者只是建立您的正则表达式联合:

fourth_regex = /^(?:#{MY_TOKENS.join('|')})?[[:space:]]*\d+[^a-z0-9]*$/
# /^(?:a|b)?[[:space:]]*\d+[^a-z0-9]*$/
puts str =~ fourth_regex
# 0

The 3 last examples should work fine if MY_TOKENS has words instead of just characters. 如果MY_TOKENS有单词而不只是字符,则最后3个示例应该可以正常工作。

first_regex , third_regex and fourth_regex should all work fine with /i flag. first_regexthird_regexfourth_regex应该都可以通过/i标志正常工作。

As an example : 举个例子 :

first_regex = /^[#{MY_TOKENS.join}]?[[:space:]]*\d+[^a-z0-9]*$/i
"A 40" =~ first_regex
# 0

I believe you want to match a string that may contain any of the alternatives you defined in the MY_TOKENS , then 0+ whitespaces and then 1 or more digits up to the end of the string. 我相信您想匹配一个字符串,该字符串可能包含您在MY_TOKENS定义的任何替代MY_TOKENS ,然后是0+空格,然后是直到字符串末尾的1个或多个数字。

Then you need to use 那你需要用

Regexp.new("\\A#{Regexp.union(MY_TOKENS)}?[[:space:]]*\\d+\\z").match?(s)

or 要么

/\A#{Regexp.union(MY_TOKENS)}?[[:space:]]*\d+\z/.match?(s)

When you use a Regexp.new , you should rememeber to double escape backslashes to define a literal backslash (eg "\\d" is a digit matching pattern). 当您使用Regexp.new ,应记住将转义的反斜杠加倍以定义文字反斜杠(例如,“ \\ d”是数字匹配模式)。 In a regex literal notation, you may use a single backslash ( /\\d/ ). 在正则表达式文字符号中,可以使用单个反斜杠( /\\d/ )。

Do not forget to match the start of a string with \\A and end of string with \\z anchors. 不要忘记用\\A匹配字符串的开头,并用\\z锚匹配字符串的结尾。

Note that [...] creates a character class that matches any char that is defined inside it: [ab] matches an a or b , [program] will match one char, either p , r , o , g , r , a or m . 请注意, [...]创建一个与其中定义的任何字符匹配的字符类: [ab]匹配ab[program]将匹配一个字符,即program If you have multicharacter sequences in the MY_TOKENS , you need to remove [...] from the pattern. 如果MY_TOKENS 序列 ,则需要从模式中删除[...]

To make the regex case insensitive, pass a case insensitive modifier to the pattern and make sure you use .source property of the Regex.union created regex to remove flags (thanks, Eric ): 要使正则表达式不区分大小写,请将不区分大小写的修饰符传递给模式,并确保您使用Regex.union创建的regex的.source属性来删除标志(感谢Eric ):

Regexp.new("(?i)\\A#{Regexp.union(MY_TOKENS).source}?[[:space:]]*\\d+\\z")

or 要么

/\A#{Regexp.union(MY_TOKENS).source}?[[:space:]]*\d+\z/i

The regex created is /(?i-mx:\\Aa|b?[[:space:]]*\\d+\\z)/ where (?i-mx) means the case insensitive mode is on and multiline (dot matches line breaks and verbose modes are off). 创建的正则表达式为/(?i-mx:\\Aa|b?[[:space:]]*\\d+\\z)/ ,其中(?i-mx)表示不区分大小写的模式和多行(点匹配行)中断和详细模式已关闭)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM