[英]Python Regex strange behavior with accented characters
I was experimenting with some Python (2.7.3) regex and I came across this behavior which I did not expect. 我正在尝试一些Python(2.7.3)正则表达式,但遇到了这种意外的行为。 In this block of code here, the following will return
False
when checking against the "ß" character as well as other accented characters like "Å", "Í", etc. 在此代码块中,当对“ß”字符以及其他重音字符(如“Å”,“Í”等)进行检查时,以下内容将返回
False
。
In addition to returning False
for the "ø" character, it will also return False
with other accented characters such as "å", "Å", "ç", "Ç", "Â", etc. 除了返回
False
的“O”字,它也将返回False
与其他重音字符,如“A”,“A”,“C”,“C”,“A”,等等。
Case and point, I'm not sure where the problem stems from when dealing with accented characters versus other characters like "¥", which it has no problem with. 大小写和要点,我不确定在处理重音字符和其他字符(例如“¥”)时问题出在哪里,这没有问题。 They all have different unicode/utf-8 values (which is what my encoding is set to), so I'm not sure where the difference lies.
它们都有不同的unicode / utf-8值(这是我的编码设置的值),所以我不确定区别在哪里。
def regex_check(name)
pattern = '[^ß]'
if re.match(pattern, str(name), re.IGNORECASE):
return True
else:
return False
print regex_check("ø")
Am I missing something obvious? 我是否缺少明显的东西? Thanks for the help.
谢谢您的帮助。
Normal strings are bytes in Python 2, you should use the u'...'
prefix to treat them as unicode strings. 普通字符串是Python 2中的字节,您应该使用
u'...'
前缀将其视为unicode字符串。
# -*- coding: utf-8 -*-
import re
def regex_check(name):
pattern = u'[^ß]' #use u'...' here
if re.match(pattern, name , re.IGNORECASE):
return True
else:
return False
print regex_check(u"ø") #use u'...' here
output: 输出:
True
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.