I'm trying to read a file that may contain strings that include \\\\
, \\n
, and \\t
, and I want to write those to another file as \\
, newline, and tab. My attempt with re.sub
doesn't seem to be working in my .py
file, but it seems to be working in the interpreter.
Here's the function I wrote to try to achieve this:
def escape_parser(snippet):
snippet = re.sub(r"\\", "\\", snippet)
snippet = re.sub(r"\t", "\t", snippet)
snippet = re.sub(r"\n", "\n", snippet)
return snippet
which causes sre_constants.error: bogus escape (end of line)
when the backslash replacement line is included, and doesn't appear to replace the literal string \\t
or \\n
with a tab or newline when I comment out the backslash line.
I played around in the interpreter to see if I could figure out a solution, but everything behaved as I'd (naively) expect.
$ python3
Python 3.4.0 (default, Mar 24 2014, 02:28:52)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> test = "for(int ${1:i}; $1 < ${2:STOP}; ++$1)\n{\n\t$0\n}"
>>> import re
>>> test = "for(int ${1:i}; $1 < ${2:STOP}; ++$1)\n{\n\t$0\n}"
>>> print(re.sub(r"\n", "\n", test))
for(int ${1:i}; $1 < ${2:STOP}; ++$1)
{
$0
}
>>> print(test)
for(int ${1:i}; $1 < ${2:STOP}; ++$1)
{
$0
}
>>> test
'for(int ${1:i}; $1 < ${2:STOP}; ++$1)\n{\n\t$0\n}'
>>> t2 = re.sub(r"\n", "foo", test)
>>> t2
'for(int ${1:i}; $1 < ${2:STOP}; ++$1)foo{foo\t$0foo}'
As for actually writing to the file, I have
with open(os.path.join(target_path, name), "w") as out: out.write(snippet)
Although I've tried using print(snippet, end="", file=out)
, too.
Edit: I've looked at similar questions like Python how to replace backslash with re.sub() and How to write list of strings to file, adding newlines? , but those solutions don't quite work, and I'd really like to do this with a regex if possible because it seems like they're a more powerful tool than Python's standard string processing functions.
Edit2: Not sure if this helps, but I thought I'd try to print what's going on in the function:
def escape_parser(snippet):
print(snippet)
print("{!r}".format(snippet))
# snippet = re.sub(r"\\", "\\", snippet)
snippet = re.sub(r"\t", "\t", snippet)
snippet = re.sub(r"\n", "\n", snippet)
print(snippet)
print("{!r}".format(snippet))
return snippet
yields
for(int ${1:i}; $1 < ${2:STOP}; ++$1)\n{\n\t$0\n}
'for(int ${1:i}; $1 < ${2:STOP}; ++$1)\\n{\\n\\t$0\\n}'
for(int ${1:i}; $1 < ${2:STOP}; ++$1)\n{\n\t$0\n}
'for(int ${1:i}; $1 < ${2:STOP}; ++$1)\\n{\\n\\t$0\\n}'
Edit3: Changing snippet = re.sub(r"\\\\", "\\\\", snippet)
to snippet = re.sub(r"\\\\", r"\\\\", snippet)
as per @BrenBarn's advice, and adding a test string in my source file yields
insert just one backslash: \\ (that's it)
"insert just one backslash: \\\\ (that's it)"
insert just one backslash: \\ (that's it)
"insert just one backslash: \\\\ (that's it)"
So I must have missed something obvious. It's a good thing one doesn't need a license to program.
Edit4: As per Process escape sequences in a string in Python , I changed escape_parser
to this:
def escape_parser(snippet):
print("pre-escaping: '{}'".format(snippet))
# snippet = re.sub(r"\\", r"\\", snippet)
# snippet = re.sub(r"\t", "\t", snippet)
# snippet = re.sub(r"\n", "\n", snippet)
snippet = bytes(snippet, "utf-8").decode("unicode_escape")
print("post-escaping: '{}'".format(snippet))
return snippet
which works in a sense. My original intention was to only replace \\\\
, \\n
, and \\t
, but this goes further than that, which isn't exactly what I wanted. Here's how things look after being run through the function (It appears print
and write
work the same for these. I may have been mistaken about print
and write
not matching up because it appears the editor I was using to inspect the output files wouldn't update if new changes were made.):
pre-escaping: 'for(int ${1:i}; $1 < ${2:STOP}; ++$1)\n{\n\t$0\n}'
post-escaping: 'for(int ${1:i}; $1 < ${2:STOP}; ++$1)
{
$0
}'
pre-escaping: 'insert just one backslash: \\ (that's it)'
post-escaping: 'insert just one backslash: \ (that's it)'
pre-escaping: 'source has one backslash \ <- right there'
post-escaping: 'source has one backslash \ <- right there'
pre-escaping: 'what about a bell \a like that?'
post-escaping: 'what about a bell like that?'
It's hard to tell if this is your main problem without seeing some data, but one problem is that you need to change your first replace to:
snippet = re.sub(r"\\", r"\\", snippet)
The reason is that backslashes have meaning in the replacement pattern as well (for group backreferences), so a single backslash is not a valid replacement string.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.