简体   繁体   English

意外的Ruby Regexp行为

[英]Unexpected Ruby Regexp behaviour

Given the following string, str : 给定以下字符串, str

\begin{align*}
\intertext{Here is some text}
x^{2}+2x+3=2\\
\intertext{Here is some more}
\end{align*}

I would like to move the intertext strings outside of the align environment, like so: 我想将文本字符串移到align环境之外,如下所示:

Here is some text
\begin{align*}
x^{2}+2x+3=2\\
\end{align*}
Here is some more

Note that I only want to do this when intertext appears immediately before or after a \\begin{something} or an \\end{something}. 请注意,我只想在\\ begin {something}或\\ end {something}之前或之后出现中间文本时执行此操作。 With this in mind, I wrote the following Regexps: 考虑到这一点,我编写了以下正则表达式:

begin_align = /\\begin\{([^}]*)\}\n\\intertext\{([^}]*)\}/m
end_align = /\\intertext\{([^}]*)\}\n\\end\{([^}]*)\}/m

Because of the grouped elements in brackets, when I call m = str.match(begin_align) , I can grab m[0] (the matched string), m[1] (which should be the given environment, align* in this example), and m[2] , which should be the text inside intertext. 由于括号中的元素已分组,当我调用m = str.match(begin_align) ,我可以获取m[0] (匹配的字符串), m[1] (应为给定环境,在此示例中为align* )和m[2] ,它应该是互文内的文本。 If I write str.match(m[0]) I get nil . 如果我写str.match(m[0])我得到nil Why? 为什么?

I found a way around this: If I instead call str.match(Regexp.quote(m[0])) , I get a match. 我找到了解决方法:如果我改为调用str.match(Regexp.quote(m[0]))str.match(Regexp.quote(m[0]))得到一个匹配项。 However , if I then try to replace this match with str.sub(Regexp.quote(m[0]),'') , say, nothing happens. 但是 ,如果我然后尝试用str.sub(Regexp.quote(m[0]),'')替换此匹配str.sub(Regexp.quote(m[0]),'') ,则说什么也没发生。 If instead I write str.sub(m[0],'') , I get the expected result. 相反,如果我写str.sub(m[0],'') ,我得到预期的结果。 How come? 怎么会?

While I was trying to debug this example, I noticed something else that I can't understand. 当我尝试调试此示例时,我注意到了我无法理解的其他内容。 If I write "\\\\begin{align".match("\\\\begin{align") , 如果我写"\\\\begin{align".match("\\\\begin{align")
I get no match despite them being identical strings. 尽管它们是相同的字符串,但我没有匹配项。 If I 'escape' the second \\\\ as: 如果我将第二个\\\\转义为:
"\\\\begin{align".match("\\\\\\\\begin{align") , "\\\\begin{align".match("\\\\\\\\begin{align")
then I get a match. 然后我得到了比赛。 If I then try to put the asterisk 如果我然后尝试将星号
"\\\\begin{align*".match("\\\\\\\\begin{align*") , "\\\\begin{align*".match("\\\\\\\\begin{align*")
I get #<MatchData "\\\\begin{align"> : it ignores the asterisk. 我得到#<MatchData "\\\\begin{align"> :它忽略星号。 I have to escape the second asterisk with \\\\* . 我必须使用\\\\*转义第二个星号。 What's going on? 这是怎么回事?

m[0] : m[0]

\\begin{align*}\n\\intertext{Here is some text}

Note on .sub() : 注意 .sub()

The pattern is typically a Regexp ; 模式通常是一个Regexp if given as a String , any regular expression metacharacters it contains will be interpreted literally. 如果以String给出,则其包含的任何正则表达式元字符都将按字面意义进行解释。

So m[0] contains * which is a quantifier. 因此m[0]包含* ,它是一个量词。 Given as '*' to .sub() it means nothing but a literal * character. .sub()'*'给出,它只不过是一个文字*字符。 But given to .match() as '*' it is interpreted as a quantifier and the reason for str.match('*') to throw an error. 但是给.match()'*'它被解释为量词,也是str.match('*')引发错误的原因。 align* in a regex context means string alig preceding any number of n characters. regex上下文中的align*表示任意n字符之前的字符串alig

So for .match() to work you have to care about such special characters but for .sub() it is just a mess to use Regexp.quote and pass it as a string. 因此, .match()起作用,您必须关心此类特殊字符,但对于.sub() ,使用Regexp.quote并将其作为字符串传递只是一团糟。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM