[英]Unexpected Ruby Regexp behaviour
Given the following string, str
: 给定以下字符串, str
:
\begin{align*}
\intertext{Here is some text}
x^{2}+2x+3=2\\
\intertext{Here is some more}
\end{align*}
I would like to move the intertext strings outside of the align environment, like so: 我想将文本字符串移到align环境之外,如下所示:
Here is some text
\begin{align*}
x^{2}+2x+3=2\\
\end{align*}
Here is some more
Note that I only want to do this when intertext appears immediately before or after a \\begin{something} or an \\end{something}. 请注意,我只想在\\ begin {something}或\\ end {something}之前或之后出现中间文本时执行此操作。 With this in mind, I wrote the following Regexps: 考虑到这一点,我编写了以下正则表达式:
begin_align = /\\begin\{([^}]*)\}\n\\intertext\{([^}]*)\}/m
end_align = /\\intertext\{([^}]*)\}\n\\end\{([^}]*)\}/m
Because of the grouped elements in brackets, when I call m = str.match(begin_align)
, I can grab m[0]
(the matched string), m[1]
(which should be the given environment, align*
in this example), and m[2]
, which should be the text inside intertext. 由于括号中的元素已分组,当我调用m = str.match(begin_align)
,我可以获取m[0]
(匹配的字符串), m[1]
(应为给定环境,在此示例中为align*
)和m[2]
,它应该是互文内的文本。 If I write str.match(m[0])
I get nil
. 如果我写str.match(m[0])
我得到nil
。 Why? 为什么?
I found a way around this: If I instead call str.match(Regexp.quote(m[0]))
, I get a match. 我找到了解决方法:如果我改为调用str.match(Regexp.quote(m[0]))
, str.match(Regexp.quote(m[0]))
得到一个匹配项。 However , if I then try to replace this match with str.sub(Regexp.quote(m[0]),'')
, say, nothing happens. 但是 ,如果我然后尝试用str.sub(Regexp.quote(m[0]),'')
替换此匹配str.sub(Regexp.quote(m[0]),'')
,则说什么也没发生。 If instead I write str.sub(m[0],'')
, I get the expected result. 相反,如果我写str.sub(m[0],'')
,我得到预期的结果。 How come? 怎么会?
While I was trying to debug this example, I noticed something else that I can't understand. 当我尝试调试此示例时,我注意到了我无法理解的其他内容。 If I write "\\\\begin{align".match("\\\\begin{align")
, 如果我写"\\\\begin{align".match("\\\\begin{align")
,
I get no match despite them being identical strings. 尽管它们是相同的字符串,但我没有匹配项。 If I 'escape' the second \\\\
as: 如果我将第二个\\\\
转义为:
"\\\\begin{align".match("\\\\\\\\begin{align")
, "\\\\begin{align".match("\\\\\\\\begin{align")
,
then I get a match. 然后我得到了比赛。 If I then try to put the asterisk 如果我然后尝试将星号
"\\\\begin{align*".match("\\\\\\\\begin{align*")
, "\\\\begin{align*".match("\\\\\\\\begin{align*")
,
I get #<MatchData "\\\\begin{align">
: it ignores the asterisk. 我得到#<MatchData "\\\\begin{align">
:它忽略星号。 I have to escape the second asterisk with \\\\*
. 我必须使用\\\\*
转义第二个星号。 What's going on? 这是怎么回事?
m[0]
: m[0]
:
\\begin{align*}\n\\intertext{Here is some text}
The pattern is typically a
Regexp
; 该模式通常是一个Regexp
; if given as aString
, any regular expression metacharacters it contains will be interpreted literally. 如果以String
给出,则其包含的任何正则表达式元字符都将按字面意义进行解释。
So m[0]
contains *
which is a quantifier. 因此m[0]
包含*
,它是一个量词。 Given as '*'
to .sub()
it means nothing but a literal *
character. 在.sub()
以'*'
给出,它只不过是一个文字*
字符。 But given to .match()
as '*'
it is interpreted as a quantifier and the reason for str.match('*')
to throw an error. 但是给.match()
以'*'
它被解释为量词,也是str.match('*')
引发错误的原因。 align*
in a regex context means string alig
preceding any number of n
characters. regex上下文中的align*
表示任意n
字符之前的字符串alig
。
So for .match()
to work you have to care about such special characters but for .sub()
it is just a mess to use Regexp.quote
and pass it as a string. 因此, .match()
起作用,您必须关心此类特殊字符,但对于.sub()
,使用Regexp.quote
并将其作为字符串传递只是一团糟。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.