简体   繁体   English

Python:从 JSON 读取正则表达式

[英]Python: read regexps from JSON

I have a JSON file where I store a mapping, which contains regexes, like the ones below:我有一个 JSON 文件,用于存储包含正则表达式的映射,如下所示:

"F(\\d)": "field-\\\\1",
"FLR[ ]*(\\w)": "floor-\\\\1",

To comply with the standard I escape the backslashes, the actually regexps should contain \\d , \\w , and \\\\1 .为了符合我转义反斜杠的标准,实际的正则表达式应该包含\\d\\w\\\\1

Once I read this JSON with json.load() I still need to post-process the resulting dictionary to get correct regexps.一旦我用 json.load() 读取了这个 JSON,我仍然需要对生成的字典进行后处理以获得正确的正则表达式。 I need to substitute a \\\\ with \\ .我需要用\\替换\\\\ What's the best way to this?最好的方法是什么?

So far I tried both re.sub() and str.replace() and in both cases it's not clear how to represent a single backslash in substation.到目前为止,我尝试了re.sub()str.replace() ,在这两种情况下都不清楚如何在变电站中表示单个反斜杠。

For example, I don't understand why the following doesn't produce a single backslash:例如,我不明白为什么以下不产生单个反斜杠:

In [76]: "\\\\d".replace("\\\\", "\\")
Out[76]: '\\d'

It does produce a single backslash - that backslash is escaped when displayed.它确实会产生一个反斜杠 - 该反斜杠在显示时会被转义。 This is done so that characters without a non-escaped way to display them can still be unambiguously printed - otherwise, you wouldn't know whether a backslash was meant to be escaping the following character or not.这样做是为了使没有非转义方式来显示它们的字符仍然可以明确打印 - 否则,您将不知道反斜杠是否打算转义以下字符。

This can be demonstrated by checking the individual characters:这可以通过检查单个字符来证明:

# In a terminal/REPL:
>>>> "\\\\d".replace("\\\\", "\\")[0]
'\\'
>>>> "\\\\d".replace("\\\\", "\\")[1]
'd'
>>>> "\\\\d".replace("\\\\", "\\")[2]
'd'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: string index out of range

One tip for doing regexes in python: Use raw strings.在 python 中执行正则表达式的一个技巧:使用原始字符串。 If you put an r before the first quote of a string literal, backslashes won't escape anything (except for an ending quote).如果将r放在字符串文字的第一个引号之前,反斜杠将不会转义任何内容(结尾引号除外)。 r"\\n" is a string containing two characters, a \\ and an n , equivalent to "\\\\n" . r"\\n"是一个包含两个字符的字符串,一个\\和一个n ,相当于"\\\\n" When working with regexes and other things where you need to send escape sequences, they're very helpful.当使用正则表达式和其他需要发送转义序列的东西时,它们非常有用。 See also: What exactly do “u” and “r” string flags do in Python, and what are raw string literals?另请参阅: “u”和“r”字符串标志在 Python 中究竟做了什么,什么是原始字符串文字?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM