[英]Python: unexpected behavior with printing/writing escape characters
I'm trying to read a file that may contain strings that include \\\\
, \\n
, and \\t
, and I want to write those to another file as \\
, newline, and tab. 我正在尝试读取一个文件,其中包含
\\\\
, \\n
和\\t
字符串,并且我想将这些字符串写为\\
,换行符和tab。 My attempt with re.sub
doesn't seem to be working in my .py
file, but it seems to be working in the interpreter. 我对
re.sub
尝试似乎未在我的.py
文件中工作,但似乎在解释器中工作。
Here's the function I wrote to try to achieve this: 这是我为实现此目的而编写的功能:
def escape_parser(snippet):
snippet = re.sub(r"\\", "\\", snippet)
snippet = re.sub(r"\t", "\t", snippet)
snippet = re.sub(r"\n", "\n", snippet)
return snippet
which causes sre_constants.error: bogus escape (end of line)
when the backslash replacement line is included, and doesn't appear to replace the literal string \\t
or \\n
with a tab or newline when I comment out the backslash line. 这会导致
sre_constants.error: bogus escape (end of line)
包含反斜杠替换行时出现sre_constants.error: bogus escape (end of line)
,并且当我注释掉反斜杠行时,似乎没有用制表符或换行符替换文字字符串\\t
或\\n
。
I played around in the interpreter to see if I could figure out a solution, but everything behaved as I'd (naively) expect. 我在解释器中玩耍,看看是否可以找到解决方案,但是所有操作都符合我(天真)的期望。
$ python3
Python 3.4.0 (default, Mar 24 2014, 02:28:52)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> test = "for(int ${1:i}; $1 < ${2:STOP}; ++$1)\n{\n\t$0\n}"
>>> import re
>>> test = "for(int ${1:i}; $1 < ${2:STOP}; ++$1)\n{\n\t$0\n}"
>>> print(re.sub(r"\n", "\n", test))
for(int ${1:i}; $1 < ${2:STOP}; ++$1)
{
$0
}
>>> print(test)
for(int ${1:i}; $1 < ${2:STOP}; ++$1)
{
$0
}
>>> test
'for(int ${1:i}; $1 < ${2:STOP}; ++$1)\n{\n\t$0\n}'
>>> t2 = re.sub(r"\n", "foo", test)
>>> t2
'for(int ${1:i}; $1 < ${2:STOP}; ++$1)foo{foo\t$0foo}'
As for actually writing to the file, I have 至于实际写入文件,我有
with open(os.path.join(target_path, name), "w") as out: out.write(snippet) 使用open(os.path.join(target_path,name),“ w”)为out:out.write(snippet)
Although I've tried using print(snippet, end="", file=out)
, too. 尽管我也尝试过使用
print(snippet, end="", file=out)
。
Edit: I've looked at similar questions like Python how to replace backslash with re.sub() and How to write list of strings to file, adding newlines? 编辑:我看过类似的问题,如Python如何用re.sub()替换反斜杠以及如何将字符串列表写入文件,添加换行符? , but those solutions don't quite work, and I'd really like to do this with a regex if possible because it seems like they're a more powerful tool than Python's standard string processing functions.
,但是这些解决方案并不能很好地发挥作用,如果可能的话,我真的很想使用正则表达式来执行此操作,因为它们似乎比Python的标准字符串处理功能更强大。
Edit2: Not sure if this helps, but I thought I'd try to print what's going on in the function: Edit2:不确定这是否有帮助,但是我想我将尝试打印该函数中发生的事情:
def escape_parser(snippet):
print(snippet)
print("{!r}".format(snippet))
# snippet = re.sub(r"\\", "\\", snippet)
snippet = re.sub(r"\t", "\t", snippet)
snippet = re.sub(r"\n", "\n", snippet)
print(snippet)
print("{!r}".format(snippet))
return snippet
yields 产量
for(int ${1:i}; $1 < ${2:STOP}; ++$1)\n{\n\t$0\n}
'for(int ${1:i}; $1 < ${2:STOP}; ++$1)\\n{\\n\\t$0\\n}'
for(int ${1:i}; $1 < ${2:STOP}; ++$1)\n{\n\t$0\n}
'for(int ${1:i}; $1 < ${2:STOP}; ++$1)\\n{\\n\\t$0\\n}'
Edit3: Changing snippet = re.sub(r"\\\\", "\\\\", snippet)
to snippet = re.sub(r"\\\\", r"\\\\", snippet)
as per @BrenBarn's advice, and adding a test string in my source file yields Edit3:按照@BrenBarn的建议,将
snippet = re.sub(r"\\\\", "\\\\", snippet)
更改为snippet = re.sub(r"\\\\", r"\\\\", snippet)
在我的源文件中添加测试字符串会产生
insert just one backslash: \\ (that's it)
"insert just one backslash: \\\\ (that's it)"
insert just one backslash: \\ (that's it)
"insert just one backslash: \\\\ (that's it)"
So I must have missed something obvious. 所以我一定错过了一些明显的事情。 It's a good thing one doesn't need a license to program.
不需要编程许可是件好事。
Edit4: As per Process escape sequences in a string in Python , I changed escape_parser
to this: Edit4:按照Python中字符串中的Process转义序列 ,我将
escape_parser
更改为:
def escape_parser(snippet):
print("pre-escaping: '{}'".format(snippet))
# snippet = re.sub(r"\\", r"\\", snippet)
# snippet = re.sub(r"\t", "\t", snippet)
# snippet = re.sub(r"\n", "\n", snippet)
snippet = bytes(snippet, "utf-8").decode("unicode_escape")
print("post-escaping: '{}'".format(snippet))
return snippet
which works in a sense. 在某种意义上是可行的。 My original intention was to only replace
\\\\
, \\n
, and \\t
, but this goes further than that, which isn't exactly what I wanted. 我最初的意图是仅替换
\\\\
, \\n
和\\t
,但这远不止于此,这并不是我想要的。 Here's how things look after being run through the function (It appears print
and write
work the same for these. I may have been mistaken about print
and write
not matching up because it appears the editor I was using to inspect the output files wouldn't update if new changes were made.): 这是通过该函数运行后的样子(看起来,
print
和write
工作与此相同。我可能会误认为print
和write
不匹配,因为看起来好像我用来检查输出文件的编辑器不会如果进行了新更改,请进行更新。):
pre-escaping: 'for(int ${1:i}; $1 < ${2:STOP}; ++$1)\n{\n\t$0\n}'
post-escaping: 'for(int ${1:i}; $1 < ${2:STOP}; ++$1)
{
$0
}'
pre-escaping: 'insert just one backslash: \\ (that's it)'
post-escaping: 'insert just one backslash: \ (that's it)'
pre-escaping: 'source has one backslash \ <- right there'
post-escaping: 'source has one backslash \ <- right there'
pre-escaping: 'what about a bell \a like that?'
post-escaping: 'what about a bell like that?'
It's hard to tell if this is your main problem without seeing some data, but one problem is that you need to change your first replace to: 在不查看某些数据的情况下很难分辨这是否是您的主要问题,但是一个问题是您需要将第一个替换项更改为:
snippet = re.sub(r"\\", r"\\", snippet)
The reason is that backslashes have meaning in the replacement pattern as well (for group backreferences), so a single backslash is not a valid replacement string. 原因是反斜杠在替换模式中也具有含义(对于组反向引用),因此单个反斜杠不是有效的替换字符串。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.