[英]How to use Regexp.union to match a character at the beginning of my string
I'm using Ruby 2.4. 我正在使用Ruby 2.4。 I want to match an optional "a" or "b" character, followed by an arbitrary amount of white space, and then one or more numbers, but my regex's are failing to match any of these:
我想匹配一个可选的“ a”或“ b”字符,然后匹配任意数量的空格,然后匹配一个或多个数字,但是我的正则表达式无法匹配以下任何一个:
2.4.0 :017 > MY_TOKENS = ["a", "b"]
=> ["a", "b"]
2.4.0 :018 > str = "40"
=> "40"
2.4.0 :019 > str =~ Regexp.new("^[#{Regexp.union(MY_TOKENS)}]?[[:space:]]*\d+[^a-z^0-9]*$")
=> nil
2.4.0 :020 > str =~ Regexp.new("^#{Regexp.union(MY_TOKENS)}?[[:space:]]*\d+[^a-z^0-9]*$")
=> nil
2.4.0 :021 > str =~ Regexp.new("^#{Regexp.union(MY_TOKENS)}?[[:space:]]*\d+$")
=> nil
I'm stumped as to what I'm doing wrong. 我为自己做错了事而感到困惑。
If they are single characters, just use MY_TOKENS.join
inside the character class: 如果它们是单个字符,
MY_TOKENS.join
在字符类中使用MY_TOKENS.join
:
MY_TOKENS = ["a", "b"]
str = "40"
first_regex = /^[#{MY_TOKENS.join}]?[[:space:]]*\d+[^a-z0-9]*$/
# /^[ab]?[[:space:]]*\d+[^a-z0-9]*$/
puts str =~ first_regex
# 0
You can also integrate the Regexp.union, it might lead to some unexpected bugs though, because the flags of the outer regexp won't apply to the inner one : 您还可以集成Regexp.union,尽管这样可能会导致一些意外的错误 ,因为外部regexp的标志不适用于内部的:
second_regex = /^#{Regexp.union(MY_TOKENS)}?[[:space:]]*\d+[^a-z0-9]*$/
# /^(?-mix:a|b)?[[:space:]]*\d+[^a-z0-9]*$/
puts str =~ second_regex
# 0
The above regex looks a lot like what you did, but using //
instead of Regexp.new
prevents you from having to escape the backslashes. 上面的正则表达式看起来很像您所做的事情,但是使用
//
代替Regexp.new
可以避免转义反斜杠。
You could use Regexp#source
to avoid this behaviour : 您可以使用
Regexp#source
避免此行为:
third_regex = /^(?:#{Regexp.union(MY_TOKENS).source})?[[:space:]]*\d+[^a-z0-9]*$/
# /^(?:a|b)?[[:space:]]*\d+[^a-z0-9]*$/
puts str =~ third_regex
# 0
or simply build your regex union : 或者只是建立您的正则表达式联合:
fourth_regex = /^(?:#{MY_TOKENS.join('|')})?[[:space:]]*\d+[^a-z0-9]*$/
# /^(?:a|b)?[[:space:]]*\d+[^a-z0-9]*$/
puts str =~ fourth_regex
# 0
The 3 last examples should work fine if MY_TOKENS
has words instead of just characters. 如果
MY_TOKENS
有单词而不只是字符,则最后3个示例应该可以正常工作。
first_regex
, third_regex
and fourth_regex
should all work fine with /i
flag. first_regex
, third_regex
和fourth_regex
应该都可以通过/i
标志正常工作。
As an example : 举个例子 :
first_regex = /^[#{MY_TOKENS.join}]?[[:space:]]*\d+[^a-z0-9]*$/i
"A 40" =~ first_regex
# 0
I believe you want to match a string that may contain any of the alternatives you defined in the MY_TOKENS
, then 0+ whitespaces and then 1 or more digits up to the end of the string. 我相信您想匹配一个字符串,该字符串可能包含您在
MY_TOKENS
定义的任何替代MY_TOKENS
,然后是0+空格,然后是直到字符串末尾的1个或多个数字。
Then you need to use 那你需要用
Regexp.new("\\A#{Regexp.union(MY_TOKENS)}?[[:space:]]*\\d+\\z").match?(s)
or 要么
/\A#{Regexp.union(MY_TOKENS)}?[[:space:]]*\d+\z/.match?(s)
When you use a Regexp.new
, you should rememeber to double escape backslashes to define a literal backslash (eg "\\d" is a digit matching pattern). 当您使用
Regexp.new
,应记住将转义的反斜杠加倍以定义文字反斜杠(例如,“ \\ d”是数字匹配模式)。 In a regex literal notation, you may use a single backslash ( /\\d/
). 在正则表达式文字符号中,可以使用单个反斜杠(
/\\d/
)。
Do not forget to match the start of a string with \\A
and end of string with \\z
anchors. 不要忘记用
\\A
匹配字符串的开头,并用\\z
锚匹配字符串的结尾。
Note that [...]
creates a character class that matches any char that is defined inside it: [ab]
matches an a
or b
, [program]
will match one char, either p
, r
, o
, g
, r
, a
or m
. 请注意,
[...]
创建一个与其中定义的任何字符匹配的字符类: [ab]
匹配a
或b
, [program]
将匹配一个字符,即p
, r
, o
, g
, r
, a
或m
。 If you have multicharacter sequences in the MY_TOKENS
, you need to remove [...]
from the pattern. 如果
MY_TOKENS
序列 ,则需要从模式中删除[...]
。
To make the regex case insensitive, pass a case insensitive modifier to the pattern and make sure you use .source
property of the Regex.union
created regex to remove flags (thanks, Eric ): 要使正则表达式不区分大小写,请将不区分大小写的修饰符传递给模式,并确保您使用
Regex.union
创建的regex的.source
属性来删除标志(感谢Eric ):
Regexp.new("(?i)\\A#{Regexp.union(MY_TOKENS).source}?[[:space:]]*\\d+\\z")
or 要么
/\A#{Regexp.union(MY_TOKENS).source}?[[:space:]]*\d+\z/i
The regex created is /(?i-mx:\\Aa|b?[[:space:]]*\\d+\\z)/
where (?i-mx)
means the case insensitive mode is on and multiline (dot matches line breaks and verbose modes are off). 创建的正则表达式为
/(?i-mx:\\Aa|b?[[:space:]]*\\d+\\z)/
,其中(?i-mx)
表示不区分大小写的模式和多行(点匹配行)中断和详细模式已关闭)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.