[英]Ruby Regex matching unexpected characters
I am trying to write a script that parses filename of a comicbook and tries to extract info such as Seriesname, Publication year etc.In this case, I am trying to extract publication year from the name. 我试图编写一个脚本来解析漫画书的文件名,并尝试提取诸如系列名称,出版年份等信息。在这种情况下,我试图从名称中提取出版年份。 Consider the following name, I would need to match and get value 2003. Below is the expression I had for this. 考虑以下名称,我需要匹配并获得2003的值。下面是我为此使用的表达式。
r = %r{ (?i)(^|[,\s-_])v(\d{4})($|[,\s-_]) }
However this matches the number irrespective of what character I have before the v or after the number 但这与数字匹配,无论我在v之前还是在数字之后使用什么字符
I expect the first two to not match and the third to match. 我希望前两个不匹配,第三个匹配。
What am I doing wrong in this case? 我在这种情况下做错了什么?
Inside character classes (ie. []
s) the -
character has a special meaning when it's between two other characters: it creates a range starting the character before and ending at the character after. 在字符类(即[]
s)内部, -
字符在其他两个字符之间时具有特殊含义:它创建一个范围,该范围开始于字符之前,之后于字符之后。
Here, you want it literally, so you should either escape the -
or (more idiomatically in regex) put it as the first or last character in the []
. 在这里,您确实需要它,因此您应该转义-
或(在regex中更惯用)将其作为[]
的第一个或最后一个字符。
Also, btw, you have literal space characters, but no /x
modifier, also you probably don't want to capture what's before and after the year, so the final pattern would be: 另外,顺便说一句,您有文字空格字符,但是没有/x
修饰符,您也可能不想捕获年份前后的内容,因此最终模式将是:
%r{(?i)(?:^|[,\s_-])v(\d{4})(?:$|[,\s_-])}
@smathy answered your question (rather nicely). @smathy回答了您的问题(很好)。 I want to point out that you could write your regex without a capture group: 我想指出的是,您可以在没有捕获组的情况下编写正则表达式:
r = /
(?: # begin a non-capture group
^|[,\s_-] # match the beginning of the string, a ws char or char in ',_-'
) # end the non-capture group
v # match v
\K # forget everything matched so far
\d{4} # match 4 digits
(?= # begin a positive look-ahead
$|[,\s_-] # match the end of the string, a ws char or char in ',_-'
) # end positive lookahead
/x
"010 - All Star Batman & Robin The Boy Wonder 01 - av2003"[r]
#=> nil
"010 - All Star Batman & Robin The Boy Wonder 01 - v2003t"[r]
#=> nil
"010 - All Star Batman & Robin The Boy Wonder 01 - v2003"[r]
#=> "2003
v
or V
, change the line v
to [vV]
. 如果要匹配v
或V
,请将行v
更改为[vV]
。 /x
to /ix
(in which case there is no need to replace v
with [vV]
). 如果希望正则表达式不区分大小写,请将/x
更改为/ix
(在这种情况下,无需用[vV]
替换v
)。 \\d{4}
to [12]\\d{3}
. 如果要确保发布日期是(例如)20或21世纪,请将\\d{4}
更改为[12]\\d{3}
。 (?<=^|[,\\s_-])
) and delete \\K
. 您也可以将非捕获组更改为正向后方( (?<=^|[,\\s_-])
)并删除\\K
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.