Python Regex带有重音字符的奇怪行为

Question

I was experimenting with some Python (2.7.3) regex and I came across this behavior which I did not expect. 我正在尝试一些Python（2.7.3）正则表达式，但遇到了这种意外的行为。 In this block of code here, the following will return False when checking against the "ß" character as well as other accented characters like "Å", "Í", etc. 在此代码块中，当对“ß”字符以及其他重音字符（如“Å”，“Í”等）进行检查时，以下内容将返回False 。

In addition to returning False for the "ø" character, it will also return False with other accented characters such as "å", "Å", "ç", "Ç", "Â", etc. 除了返回False的“O”字，它也将返回False与其他重音字符，如“A”，“A”，“C”，“C”，“A”，等等。

Case and point, I'm not sure where the problem stems from when dealing with accented characters versus other characters like "¥", which it has no problem with. 大小写和要点，我不确定在处理重音字符和其他字符（例如“¥”）时问题出在哪里，这没有问题。 They all have different unicode/utf-8 values (which is what my encoding is set to), so I'm not sure where the difference lies. 它们都有不同的unicode / utf-8值（这是我的编码设置的值），所以我不确定区别在哪里。

def regex_check(name)
    pattern = '[^ß]'
    if re.match(pattern, str(name), re.IGNORECASE):
        return True
    else:
        return False

print regex_check("ø")

Am I missing something obvious? 我是否缺少明显的东西？ Thanks for the help. 谢谢您的帮助。

Answer 1

Normal strings are bytes in Python 2, you should use the u'...' prefix to treat them as unicode strings. 普通字符串是Python 2中的字节，您应该使用u'...'前缀将其视为unicode字符串。

# -*- coding: utf-8 -*-
import re
def regex_check(name):
    pattern = u'[^ß]'    #use u'...' here  
    if re.match(pattern, name , re.IGNORECASE):
        return True
    else:
        return False

print regex_check(u"ø")  #use u'...' here

output: 输出：

True

Python Regex带有重音字符的奇怪行为

问题描述

1 个解决方案

解决方案1
3 已采纳 2013-09-07 18:00:16

Python Regex带有重音字符的奇怪行为

问题描述

1 个解决方案

解决方案1 3 已采纳 2013-09-07 18:00:16

解决方案1
3 已采纳 2013-09-07 18:00:16