带有Unicode字符的Python正则表达式错误？

Question

Long story short: 长话短说：

>>> re.compile(r"\w*").match(u"Français")
<_sre.SRE_Match object at 0x1004246b0>
>>> re.compile(r"^\w*$").match(u"Français")
>>> re.compile(r"^\w*$").match(u"Franais")
<_sre.SRE_Match object at 0x100424780>
>>>

Why doesn't it match the string with unicode characters with ^ and $ in the regex? 为什么正则表达式中的Unicode和^和$不匹配？ As far as I understand ^ stands for the beginning of the string(line) and $ - for the end of it. 据我了解， ^代表字符串（行）的开头， $代表字符串的结尾。

Answer 1

You need to specify the UNICODE flag , otherwise \\w is just equivalent to [a-zA-Z0-9_] , which does not include the character ' ç '. 您需要指定UNICODE标志，否则\\w等效于[a-zA-Z0-9_] ，其中不包含字符' ç '。

>>> re.compile(r"^\w*$", re.U).match(u"Fran\xe7ais")
<_sre.SRE_Match object at 0x101474168>

带有Unicode字符的Python正则表达式错误？

问题描述

1 个解决方案

解决方案1
5 已采纳 2010-08-31 08:36:54

带有Unicode字符的Python正则表达式错误？

问题描述

1 个解决方案

解决方案1 5 已采纳 2010-08-31 08:36:54

解决方案1
5 已采纳 2010-08-31 08:36:54