简体   繁体   中英

What's the correct syntax for escapes and captures in a perl one-liner?

I'm attempting to use pandoc to convert latex files (which were automatically generated by doxygen) to .docx format. I have encountered an error, perhaps in doxygen, which allows some characters that should be escaped ( _ and % ) to go unescaped in the DoxyCode latex environment. Some underscores occur in filenames, and are inside braces. Those should not be escaped.

I wrote a perl one-liner that locates any underscores or percents that aren't between braces, and replaces them with a backslash followed by the same character:

perl -i -pe 's/(?<!\\)([_%])(?![^{]+})/\\$1/g' test.tex

This works as expected. However, I then discovered that some of the files contain, eg, an initializer list within braces, with some variables containing underscores, inside a DoxyCode environment. So I need a perl script that can recognize when the underscore or percent is between \\begin{DoxyCode} and \\end{DoxyCode} and insert a backslash if there be none.

The regex for this command is working; see https://regex101.com/r/gsQm2L/2

Although it only grabs the first match. I'm hoping perl can grab other matches, but I may be mistaken.

The command I have is

perl -i -pe 's/(?<=begin\{DoxyCode})([\s\S]+?[^\\])([_%])([\s\S]+?)(?=end\{DoxyCode})/$1\\$2$3/g' test.tex

but it fails to make any changes. (I tried not escaping the left braces, but I got an error: Unescaped left brace in regex is deprecated, passed through in regex; etc.) I can't tell whether it's failing to find matches or failing to replace them because my capture syntax is incorrect.

For both the first and second example, the original contents of test.tex are as follows:

\begin{DoxyCode}                                                                                                     
17 This is some code that contains an_undersc_ore and                                                                
18 an escaped\_underscore. Plus another unescaped_unders_core                                                        
19 for good measure.                                                                                                 
20 As if that was not "bad" enough, it also contains a %percent sign                                                 
21 that is unescaped.                                                                                                
\end{DoxyCode}                                                                                                       

Here is some other stuff that may contain \index{things_not_to_be_escaped}.                                          

\begin{DoxyCode}                                                                                                     
17 This is some code that contains an_underscore and                                                                 
18 an escaped\_underscore. Plus another unescaped_underscore                                                         
19 for good measure.                                                                                                 
20 As if that was not "bad" enough, it also contains a \%percent sign                                                
21 that is escaped.                                                                                                  
\end{DoxyCode}     

The desired content of test.tex, after running the perl command, would be the following:

\begin{DoxyCode}                                                                                                     
17 This is some code that contains an\_undersc\_ore and                                                                
18 an escaped\_underscore. Plus another unescaped\_unders\_core                                                        
19 for good measure.                                                                                                 
20 As if that was not "bad" enough, it also contains a \%percent sign                                                 
21 that is unescaped.                                                                                                
\end{DoxyCode}                                                                                                       

Here is some other stuff that may contain \index{things_not_to_be_escaped}.                                          

\begin{DoxyCode}                                                                                                     
17 This is some code that contains an\_underscore and                                                                 
18 an escaped\_underscore. Plus another unescaped\_underscore                                                         
19 for good measure.                                                                                                 
20 As if that was not "bad" enough, it also contains a \%percent sign                                                
21 that is escaped.                                                                                                  
\end{DoxyCode}     

Why is my perl one-liner failing? And how do I get the desired output? I'm by no means a perl or regex expert, so I welcome feedback on other errors.

In case it's relevant, I'm working on debian stretch, and perl --version returns

This is perl 5, version 24, subversion 1 (v5.24.1) built for x86_64-linux-gnu-thread-multi
(with 85 registered patches, see perl -V for more detail)

Easy, while the "right" way to do this is with a regex parser, it's still simple enough that you could do it with a one liner. The key is doing a two stage substitution. I added a use case for literal backslashes (\\) that are not starting an escape for a _ or %. If there could be other embedded {} then they can be excluded with the same paradigm.

$text = <<'EOF';
\begin{DoxyCode}
17 This is some code that contains an_undersc_ore and
18 an escaped\_underscore. Plus another unescaped_unders_core
19 for good measure. A literal \ and a literal \\_.
20 As if that was not "bad" enough, it also contains a %percent sign
21 that is unescaped.
\end{DoxyCode}

Here is some other stuff that may contain \index{things_not_to_be_escaped}.

\begin{DoxyCode}
17 This is some code that contains an_underscore and
18 an escaped\_underscore. Plus another unescaped_underscore
19 for good measure. A literal \\%.
20 As if that was not "bad" enough, it also contains a \%percent sign
21 that is escaped.
\end{DoxyCode}
EOF

print "before:\n$text\n\n";
$text =~ s{\Q\begin{DoxyCode}\E\K(.+?)(\Q\end{DoxyCode}\E)}{
    my($t,$e) = ($1,$2);
    $t =~ s{(\\\\ | \\?[_%])}{1==length $1 ? "\\$1" : $1}egsx; "$t$e";
}egs;
print "after:\n$text\n";

Output:

before:
\begin{DoxyCode}
17 This is some code that contains an_undersc_ore and
18 an escaped\_underscore. Plus another unescaped_unders_core
19 for good measure. A literal \ and a literal \\_.
20 As if that was not "bad" enough, it also contains a %percent sign
21 that is unescaped.
\end{DoxyCode}

Here is some other stuff that may contain \index{things_not_to_be_escaped}.

\begin{DoxyCode}
17 This is some code that contains an_underscore and
18 an escaped\_underscore. Plus another unescaped_underscore
19 for good measure. A literal \\%.
20 As if that was not "bad" enough, it also contains a \%percent sign
21 that is escaped.
\end{DoxyCode}


after:
\begin{DoxyCode}
17 This is some code that contains an\_undersc\_ore and
18 an escaped\_underscore. Plus another unescaped\_unders\_core
19 for good measure. A literal \ and a literal \\\_.
20 As if that was not "bad" enough, it also contains a \%percent sign
21 that is unescaped.
\end{DoxyCode}

Here is some other stuff that may contain \index{things_not_to_be_escaped}.

\begin{DoxyCode}
17 This is some code that contains an\_underscore and
18 an escaped\_underscore. Plus another unescaped\_underscore
19 for good measure. A literal \\\%.
20 As if that was not "bad" enough, it also contains a \%percent sign
21 that is escaped.
\end{DoxyCode}

Also read http://perldoc.perl.org/perlre.html andhttp://perldoc.perl.org/perlop.html#Regexp-Quote-Like-Operators . Pay special attention the \\G assertion and the /gc flags. That is how you would write a proper parser for this task.

HTH

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM