简体   繁体   中英

How to avoid re.sub processing backslashes in replacement string in python 3.10.5?

I'm running into an (<class 're.error'>, error('bad escape \S at position 51'), <traceback object at 0x00000230E5F63580>) when trying to use re.sub

My code is probably a mess, because I started to learn Python a week a go. It runs through certain (xml) files in a given source folder, copy a part of the file based on a regex, and paste it into a file with the same name in a given target folder, replacing the existing part based on the same regex. It works fine, but as soon as the replacement string contains \S (which can happen, because it is a path), the replacement fails.

Here is my code (sorry for the mess):

import re
import os
from tkinter import messagebox
from tkinter import simpledialog

source_input = simpledialog.askstring(title="Quelldateien", prompt="Pfad zu den Quelldateien:\t\t\t\t\t\t\t\t\t")
target_input = simpledialog.askstring(title="Zieldateien", prompt="Pfad zu den Zieldateien:\t\t\t\t\t\t\t\t\t")
search_pattern = re.compile("<reference>.*?</reference>", re.DOTALL)

for path, subdirs, files in os.walk(source_input):
    for filename in files:
        if filename.endswith(".sdlxliff"):
            source_file = open(path + os.sep + filename, 'r', encoding="utf8")
            source_content = source_file.read()
            source_file.close()
            source_reference = re.search(search_pattern, source_content)
            source_reference_string = source_reference.group(0)

            target_path = path.replace(source_input, target_input)
            if os.path.exists(target_path + os.sep + filename):
                target_file = open(target_path + os.sep + filename, 'r', encoding="utf8")
                target_content = target_file.read()
                target_file.close()

                newdata = re.sub(search_pattern, source_reference_string, target_content)

                target_file = open(target_path + os.sep + filename, 'w', encoding="utf8")
                target_file.write(newdata)
                target_file.close()

messagebox.showinfo(title="Erledigt", message="Der Referenzteil wurde ersetzt.")

The replacement string in re.sub (source_reference_string variable) looks like this:

<reference><external-file href="file://C:\\_Projekte\\S$$$\\220909_error_ZV\\en-US\\$$$$ - Kopie.pptx" uid="Pptx.DependencyFileId"/></reference>

I found this thread and tried to replace re with regex, but I ran into the same error: Python 3.7.4: 're.error: bad escape \s at position 0'

I would like re.sub to just take the replacement string without interpreting any backslashes.

Thanks for any help.

According to the documentation of re.sub the repl argument can be a string or a function. if it is a string, any backslash escapes in it are processed. You could pass a lambda function there just return your string without any processing of backslashes.

newdata = re.sub(search_pattern, lambda _: source_reference_string, target_content)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM