简体   繁体   中英

Python: unexpected behavior with printing/writing escape characters

I'm trying to read a file that may contain strings that include \\\\ , \\n , and \\t , and I want to write those to another file as \\ , newline, and tab. My attempt with re.sub doesn't seem to be working in my .py file, but it seems to be working in the interpreter.

Here's the function I wrote to try to achieve this:

def escape_parser(snippet):
    snippet = re.sub(r"\\", "\\", snippet)
    snippet = re.sub(r"\t", "\t", snippet)
    snippet = re.sub(r"\n", "\n", snippet)

    return snippet

which causes sre_constants.error: bogus escape (end of line) when the backslash replacement line is included, and doesn't appear to replace the literal string \\t or \\n with a tab or newline when I comment out the backslash line.

I played around in the interpreter to see if I could figure out a solution, but everything behaved as I'd (naively) expect.

$ python3
Python 3.4.0 (default, Mar 24 2014, 02:28:52) 
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> test = "for(int ${1:i}; $1 < ${2:STOP}; ++$1)\n{\n\t$0\n}"
>>> import re
>>> test = "for(int ${1:i}; $1 < ${2:STOP}; ++$1)\n{\n\t$0\n}"
>>> print(re.sub(r"\n", "\n", test))
for(int ${1:i}; $1 < ${2:STOP}; ++$1)
{
    $0
}
>>> print(test)
for(int ${1:i}; $1 < ${2:STOP}; ++$1)
{
    $0
}
>>> test
'for(int ${1:i}; $1 < ${2:STOP}; ++$1)\n{\n\t$0\n}'
>>> t2 = re.sub(r"\n", "foo", test)
>>> t2
'for(int ${1:i}; $1 < ${2:STOP}; ++$1)foo{foo\t$0foo}'

As for actually writing to the file, I have

with open(os.path.join(target_path, name), "w") as out: out.write(snippet)

Although I've tried using print(snippet, end="", file=out) , too.

Edit: I've looked at similar questions like Python how to replace backslash with re.sub() and How to write list of strings to file, adding newlines? , but those solutions don't quite work, and I'd really like to do this with a regex if possible because it seems like they're a more powerful tool than Python's standard string processing functions.

Edit2: Not sure if this helps, but I thought I'd try to print what's going on in the function:

def escape_parser(snippet):                                                                                                                                                                                       
    print(snippet)                                                                                                                                                                                                
    print("{!r}".format(snippet))                                                                                                                                                                                 

    # snippet = re.sub(r"\\", "\\", snippet)                                                                                                                                                                      
    snippet = re.sub(r"\t", "\t", snippet)                                                                                                                                                                        
    snippet = re.sub(r"\n", "\n", snippet)                                                                                                                                                                        

    print(snippet)                                                                                                                                                                                                
    print("{!r}".format(snippet))                                                                                                                                                                                 

    return snippet

yields

for(int ${1:i}; $1 < ${2:STOP}; ++$1)\n{\n\t$0\n}
'for(int ${1:i}; $1 < ${2:STOP}; ++$1)\\n{\\n\\t$0\\n}'
for(int ${1:i}; $1 < ${2:STOP}; ++$1)\n{\n\t$0\n}
'for(int ${1:i}; $1 < ${2:STOP}; ++$1)\\n{\\n\\t$0\\n}'

Edit3: Changing snippet = re.sub(r"\\\\", "\\\\", snippet) to snippet = re.sub(r"\\\\", r"\\\\", snippet) as per @BrenBarn's advice, and adding a test string in my source file yields

insert just one backslash: \\ (that's it)
"insert just one backslash: \\\\ (that's it)"
insert just one backslash: \\ (that's it)
"insert just one backslash: \\\\ (that's it)"

So I must have missed something obvious. It's a good thing one doesn't need a license to program.

Edit4: As per Process escape sequences in a string in Python , I changed escape_parser to this:

def escape_parser(snippet):                                                                                                                                                                                                                                                                                                                                             
    print("pre-escaping: '{}'".format(snippet))                                                                                                                                                                   

    # snippet = re.sub(r"\\", r"\\", snippet)                                                                                                                                                                     
    # snippet = re.sub(r"\t", "\t", snippet)                                                                                                                                                                      
    # snippet = re.sub(r"\n", "\n", snippet)                                                                                                                                                                      
    snippet = bytes(snippet, "utf-8").decode("unicode_escape")                                                                                                                                                    

    print("post-escaping: '{}'".format(snippet))                                                                                                                                                                  

    return snippet

which works in a sense. My original intention was to only replace \\\\ , \\n , and \\t , but this goes further than that, which isn't exactly what I wanted. Here's how things look after being run through the function (It appears print and write work the same for these. I may have been mistaken about print and write not matching up because it appears the editor I was using to inspect the output files wouldn't update if new changes were made.):

pre-escaping: 'for(int ${1:i}; $1 < ${2:STOP}; ++$1)\n{\n\t$0\n}'
post-escaping: 'for(int ${1:i}; $1 < ${2:STOP}; ++$1)
{
    $0
}'
pre-escaping: 'insert just one backslash: \\ (that's it)'
post-escaping: 'insert just one backslash: \ (that's it)'
pre-escaping: 'source has one backslash \ <- right there'
post-escaping: 'source has one backslash \ <- right there'
pre-escaping: 'what about a bell \a like that?'
post-escaping: 'what about a bell  like that?'

It's hard to tell if this is your main problem without seeing some data, but one problem is that you need to change your first replace to:

snippet = re.sub(r"\\", r"\\", snippet)

The reason is that backslashes have meaning in the replacement pattern as well (for group backreferences), so a single backslash is not a valid replacement string.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM