意外的Ruby Regexp行为

Question

Given the following string, str : 给定以下字符串， str ：

\begin{align*}
\intertext{Here is some text}
x^{2}+2x+3=2\\
\intertext{Here is some more}
\end{align*}

I would like to move the intertext strings outside of the align environment, like so: 我想将文本字符串移到align环境之外，如下所示：

Here is some text
\begin{align*}
x^{2}+2x+3=2\\
\end{align*}
Here is some more

Note that I only want to do this when intertext appears immediately before or after a \\begin{something} or an \\end{something}. 请注意，我只想在\\ begin {something}或\\ end {something}之前或之后出现中间文本时执行此操作。 With this in mind, I wrote the following Regexps: 考虑到这一点，我编写了以下正则表达式：

begin_align = /\\begin\{([^}]*)\}\n\\intertext\{([^}]*)\}/m
end_align = /\\intertext\{([^}]*)\}\n\\end\{([^}]*)\}/m

Because of the grouped elements in brackets, when I call m = str.match(begin_align) , I can grab m[0] (the matched string), m[1] (which should be the given environment, align* in this example), and m[2] , which should be the text inside intertext. 由于括号中的元素已分组，当我调用m = str.match(begin_align) ，我可以获取m[0] （匹配的字符串）， m[1] （应为给定环境，在此示例中为align* ）和m[2] ，它应该是互文内的文本。 If I write str.match(m[0]) I get nil . 如果我写str.match(m[0])我得到nil 。 Why? 为什么？

I found a way around this: If I instead call str.match(Regexp.quote(m[0])) , I get a match. 我找到了解决方法：如果我改为调用str.match(Regexp.quote(m[0])) ， str.match(Regexp.quote(m[0]))得到一个匹配项。 However , if I then try to replace this match with str.sub(Regexp.quote(m[0]),'') , say, nothing happens. 但是，如果我然后尝试用str.sub(Regexp.quote(m[0]),'')替换此匹配str.sub(Regexp.quote(m[0]),'') ，则说什么也没发生。 If instead I write str.sub(m[0],'') , I get the expected result. 相反，如果我写str.sub(m[0],'') ，我得到预期的结果。 How come? 怎么会？

While I was trying to debug this example, I noticed something else that I can't understand. 当我尝试调试此示例时，我注意到了我无法理解的其他内容。 If I write "\\\\begin{align".match("\\\\begin{align") , 如果我写"\\\\begin{align".match("\\\\begin{align") ，
I get no match despite them being identical strings. 尽管它们是相同的字符串，但我没有匹配项。 If I 'escape' the second \\\\ as: 如果我将第二个\\\\转义为：
"\\\\begin{align".match("\\\\\\\\begin{align") , "\\\\begin{align".match("\\\\\\\\begin{align") ，
then I get a match. 然后我得到了比赛。 If I then try to put the asterisk 如果我然后尝试将星号
"\\\\begin{align*".match("\\\\\\\\begin{align*") , "\\\\begin{align*".match("\\\\\\\\begin{align*") ，
I get #<MatchData "\\\\begin{align"> : it ignores the asterisk. 我得到#<MatchData "\\\\begin{align"> ：它忽略星号。 I have to escape the second asterisk with \\\\* . 我必须使用\\\\*转义第二个星号。 What's going on? 这是怎么回事？

Answer 1

m[0] : m[0] ：

\\begin{align*}\n\\intertext{Here is some text}

Note on .sub() : 注意 .sub() ：

The pattern is typically a Regexp ; 该模式通常是一个Regexp ； if given as a String , any regular expression metacharacters it contains will be interpreted literally. 如果以String给出，则其包含的任何正则表达式元字符都将按字面意义进行解释。

So m[0] contains * which is a quantifier. 因此m[0]包含* ，它是一个量词。 Given as '*' to .sub() it means nothing but a literal * character. 在.sub()以'*'给出，它只不过是一个文字*字符。 But given to .match() as '*' it is interpreted as a quantifier and the reason for str.match('*') to throw an error. 但是给.match()以'*'它被解释为量词，也是str.match('*')引发错误的原因。 align* in a regex context means string alig preceding any number of n characters. regex上下文中的align*表示任意n字符之前的字符串alig 。

So for .match() to work you have to care about such special characters but for .sub() it is just a mess to use Regexp.quote and pass it as a string. 因此， .match()起作用，您必须关心此类特殊字符，但对于.sub() ，使用Regexp.quote并将其作为字符串传递只是一团糟。

意外的Ruby Regexp行为

问题描述

1 个解决方案

解决方案1
0 2018-05-11 14:04:26

意外的Ruby Regexp行为

问题描述

1 个解决方案

解决方案1 0 2018-05-11 14:04:26

解决方案1
0 2018-05-11 14:04:26