简体   繁体   中英

Python + Regex + Replace pattern with multiple copies of that pattern

I have to take a string from the user and format it so that it is acceptable for certain command line consumption. Basically, I need to replace any backslashes that come before a double quote (") with two back slashes. I can find the pattern using this regex:

import re

pattern = '\\\\+"'
string = "\\\\\\\" asdf \\\" \\ \\ \\\\\""

print string, "\n"
matches = re.findall(pattern, string)

But now that I have those matches, how do I replace them with double copies of themselves? So the 3 back slashes in front of a quote has to become 6, the 1 slash becomes 2, and the 2 becomes 4. The slashes that are not in front of quotes stay the same length.

Any advice on this would be greatly appreciated.

Thanks.

You should use single-quotes, raw strings, and re.sub :

string = r'\\\" asdf \" \ \ \\"'
new_string = re.sub(r'(\\+)"', r'\1\1"', string)
print(new_string)

Output:

\\\\\\" asdf \\" \ \ \\\\"

The Pattern

To explain the pattern, first let's remove the parentheses; they don't affect what's matched, and we'll put them back later. The pattern r'\\\\+"' means "one or more backslashes followed by a double-quote". Even though it's a raw string, we still have to escape the backslash because backslashes have special meaning in regular expressions; that's why it's r'\\\\+"' instead of r'\\+"' .

The Parentheses

The parentheses around the \\\\+ in the actual pattern just mean "capture the part of the match inside these parentheses". This will put the substring of all backslashes in this match into a capture group. We're going to use this capture group in the replacement string.

The Replacement String

The replacement string, r'\\1\\1"' , just means "two copies of the first capture group followed by a double-quote" (in this case there's only one capture group, but there can be more). The reason the replacement string has a double-quote is because the match had a double-quote; since the entire match is replaced by the replacement string, if the replacement string didn't have a double-quote, the double-quotes would be removed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM