简体   繁体   English

在Python中取消转义字符串

[英]Unescape strings in Python

I have an input file that contains a list of inputs, one per line. 我有一个输入文件,其中包含输入列表,每行一个。 Each line of input is enclosed in double quotes. 输入的每一行都用双引号引起来。 The inputs sometimes have a backslash or few double-quotes as within the enclosing double-quotes (check example below). 输入有时会包含反斜杠或一些双引号,如括在双引号内(请参见下面的示例)。

Sample inputs — 样本输入—

"each line is enclosed in double-quotes"
"Double quotes inside a \"double-quoted\" string!"
"This line contains backslashes \\not so cool\\"
"too many double-quotes in a line \"\"\"too much\"\"\""
"too many backslashes \\\\\\\"horrible\"\\\\\\"

I would like to take the above inputs and simply convert the ones with the escaped double quotes in the lines to a back-tick ` . 我想接受以上输入并将行中带有转义双引号的输入转换为反引号`

I assume that there is a straightforward one-line solution to this. 我认为对此有一个简单的单线解决方案。 I tried the following but it doesn't work. 我尝试了以下操作,但不起作用。 Any other one-liner solution or a fix to the below code would be greatly appreciated. 任何其他单线解决方案或对以下代码的修复将不胜感激。

def fix(line):
    return re.sub(r'\\"', '`', line)

It fails for input lines 3 and 5 . 输入线路35失败。

"each line is enclosed in double-quotes"
"Double quotes inside a `double-quoted` string!"
"This line contains backslashes \\not so cool\`
"too many double-quotes in a line ```too much```"
"too many backslashes \\\\\\`horrible`\\\\\`

Any fix I can think of breaks other lines. 我能想到的任何修复方法都会破坏其他方面。 Please help! 请帮忙!

This is not quite what you asked for as it replaces with " rather than `, but I'll mention it ... you could always leverage off csv to do \\" conversion correctly for you: 这不是您所要求的,因为它被替换为"而不是`,但是我会提到它……您始终可以利用csv为您正确地进行\\"转换:

>>> for line in csv.reader(["each line is enclosed in double-quotes",
...                         "Double quotes inside a \"double-quoted\" string!",
...                         "This line contains backslashes \\not so cool\\",
...                         "too many double-quotes in a line \"\"\"too much\"\"\"",
...                         "too many backslashes \\\\\\\"horrible\"\\\\\\",
...                         ]):
...         print(line)
...     
['each line is enclosed in double-quotes']
['Double quotes inside a "double-quoted" string!']
['This line contains backslashes \\not so cool\\']
['too many double-quotes in a line """too much"""']
['too many backslashes \\\\\\"horrible"\\\\\\']

If it is then important that they be actual `'s, you could simply do a replace on the text returned by the csv module. 如果很重要的一点是要让它们成为实际的`,可以简单地对csv模块返回的文本进行替换。

在反斜杠后添加+

return re.sub(r'\\+"', '`', line)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM