简体   繁体   English

如何在从文本文件中读取正则表达式时阻止Python转义特殊字符?

[英]How to prevent Python from escaping special characters when reading a regex from a text file?

I am reading a text file in Python that, among other things, contains pre-written regexes that will be used for matching later on. 我正在阅读Python中的一个文本文件,其中包含预先编写的正则表达式,稍后将用于匹配。 The text file is of the following format: 文本文件具有以下格式:

... ...

--> Task 2 - >任务2

Concatenate and print the strings "Hello, " and "world!" 连接并打印字符串“Hello”和“world!” to the screen. 到屏幕。

--> Answer - >回答

Hello, world! 你好,世界!

print(\\"Hello,\\s\\"\\s*+\\s*\\"world!\\") 打印(\\ “你好,\\ s \\” 的\\ S * + \\ S * \\ “的世界!\\”)

--> Hint 1 - >提示1

You can concatenate two strings with the + operator 您可以使用+运算符连接两个字符串

... ...

User input is being accepted based on tasks and either executed in a subprocess to see a return value or matched against a regex. 正在根据任务接受用户输入,并在子进程中执行以查看返回值或与正则表达式匹配。 The issue, though, is that python's file.readline() will escape all special characters in the regex string (ie backslashes), giving me something that isn't useful. 但问题是,python的file.readline()将转义正则表达式字符串中的所有特殊字符(即反斜杠),这给了我一些无用的东西。

I tried to read in the file as bytes and decode the lines using the 'raw_unicode_escape' argument (described as producing "a string that is suitable as raw Unicode literal in Python source code"), but no dice: 我尝试在文件中读取字节并使用'raw_unicode_escape'参数解码行(描述为生成“适合作为Python源代码中的原始Unicode文字的字符串”),但没有骰子:

file.open(filename, 'rb')
for line in file:
  line = line.decode('raw_unicode_escape')
  ...

Am I going about this the completely wrong way? 我是以完全错误的方式来做这件事的吗?

Thanks for any and all help. 感谢您的帮助。

ps I found this question as well: Issue while reading special characters from file . ps我也发现了这个问题: 从文件中读取特殊字符时出现问题 However, I still have the same trouble when I use file.open(filename, 'r', encoding='utf-8') . 但是,当我使用file.open(filename, 'r', encoding='utf-8')时,我仍然遇到同样的问题。

Python regex patterns are just plain old strings. Python正则表达式模式只是普通的旧字符串。 There should be no problem with storing them in a file. 将它们存储在文件中应该没有问题。 Perhaps when you use file.readline() you are seeing escaped characters because you are looking at the repr of the line? 也许当你使用file.readline()你会看到转义字符,因为你正在查看该行的repr That should not be an issue when you actually use the pattern as a regex however: 当您实际使用该模式作为正则表达式时,这应该不是问题:

import re
filename='/tmp/test.txt'
with open(filename,'w') as f:
    f.write(r'\"Hello,\s\"\s*\+\s*\"world!\"')

with open(filename,'r') as f:
    pat = f.readline()
    print(pat)
    # \"Hello,\s\"\s*\+\s*\"world!\"
    print(repr(pat))
    # '\\"Hello,\\s\\"\\s*\\+\\s*\\"world!\\"'
    assert re.search(pat,'  "Hello, " +   "world!"')  # Shows match was found

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM