[英]Why my RegEx pattern in Unicode does not work?
import re
file = open('C:\item.bh.txt', 'r', encoding = 'utf-16')
pattern = re.findall(ur'[\u09ac][\u0995]', file)
It shows the following error:它显示以下错误:
File "<ipython-input-22-bbd94837f9ee>", line 1 pattern = re.findall(ur'[\ব][\ক]', file) ^ SyntaxError: invalid syntax
It doesn't make sense to have a raw unicode string here as you want the escape sequences to be interpreted.由于您希望解释转义序列,因此在此处使用原始 unicode 字符串是没有意义的。 Second
re.findall
takes a string, not a file, so you have to read the file.第二个
re.findall
需要一个字符串,而不是一个文件,所以你必须读取文件。 The character classes are also not needed because they contain only a single character.也不需要字符类,因为它们只包含一个字符。
re.findall(u'\u09ac\u0995', file.read())
Or in context:或者在上下文中:
import re
file = open(r'C:\item.bh.txt', 'r', encoding = 'utf-16')
pattern = re.findall(u'\u09ac\u0995', file.read())
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.