简体   繁体   中英

Unexpected Ruby Regexp behaviour

Given the following string, str :

\begin{align*}
\intertext{Here is some text}
x^{2}+2x+3=2\\
\intertext{Here is some more}
\end{align*}

I would like to move the intertext strings outside of the align environment, like so:

Here is some text
\begin{align*}
x^{2}+2x+3=2\\
\end{align*}
Here is some more

Note that I only want to do this when intertext appears immediately before or after a \\begin{something} or an \\end{something}. With this in mind, I wrote the following Regexps:

begin_align = /\\begin\{([^}]*)\}\n\\intertext\{([^}]*)\}/m
end_align = /\\intertext\{([^}]*)\}\n\\end\{([^}]*)\}/m

Because of the grouped elements in brackets, when I call m = str.match(begin_align) , I can grab m[0] (the matched string), m[1] (which should be the given environment, align* in this example), and m[2] , which should be the text inside intertext. If I write str.match(m[0]) I get nil . Why?

I found a way around this: If I instead call str.match(Regexp.quote(m[0])) , I get a match. However , if I then try to replace this match with str.sub(Regexp.quote(m[0]),'') , say, nothing happens. If instead I write str.sub(m[0],'') , I get the expected result. How come?

While I was trying to debug this example, I noticed something else that I can't understand. If I write "\\\\begin{align".match("\\\\begin{align") ,
I get no match despite them being identical strings. If I 'escape' the second \\\\ as:
"\\\\begin{align".match("\\\\\\\\begin{align") ,
then I get a match. If I then try to put the asterisk
"\\\\begin{align*".match("\\\\\\\\begin{align*") ,
I get #<MatchData "\\\\begin{align"> : it ignores the asterisk. I have to escape the second asterisk with \\\\* . What's going on?

m[0] :

\\begin{align*}\n\\intertext{Here is some text}

Note on .sub() :

The pattern is typically a Regexp ; if given as a String , any regular expression metacharacters it contains will be interpreted literally.

So m[0] contains * which is a quantifier. Given as '*' to .sub() it means nothing but a literal * character. But given to .match() as '*' it is interpreted as a quantifier and the reason for str.match('*') to throw an error. align* in a regex context means string alig preceding any number of n characters.

So for .match() to work you have to care about such special characters but for .sub() it is just a mess to use Regexp.quote and pass it as a string.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM