简体   繁体   English

Python Regex带有重音字符的奇怪行为

[英]Python Regex strange behavior with accented characters

I was experimenting with some Python (2.7.3) regex and I came across this behavior which I did not expect. 我正在尝试一些Python(2.7.3)正则表达式,但遇到了这种意外的行为。 In this block of code here, the following will return False when checking against the "ß" character as well as other accented characters like "Å", "Í", etc. 在此代码块中,当对“ß”字符以及其他重音字符(如“Å”,“Í”等)进行检查时,以下内容将返回False

In addition to returning False for the "ø" character, it will also return False with other accented characters such as "å", "Å", "ç", "Ç", "Â", etc. 除了返回False的“O”字,它也将返回False与其他重音字符,如“A”,“A”,“C”,“C”,“A”,等等。

Case and point, I'm not sure where the problem stems from when dealing with accented characters versus other characters like "¥", which it has no problem with. 大小写和要点,我不确定在处理重音字符和其他字符(例如“¥”)时问题出在哪里,这没有问题。 They all have different unicode/utf-8 values (which is what my encoding is set to), so I'm not sure where the difference lies. 它们都有不同的unicode / utf-8值(这是我的编码设置的值),所以我不确定区别在哪里。

def regex_check(name)
    pattern = '[^ß]'
    if re.match(pattern, str(name), re.IGNORECASE):
        return True
    else:
        return False

print regex_check("ø") 

Am I missing something obvious? 我是否缺少明显的东西? Thanks for the help. 谢谢您的帮助。

Normal strings are bytes in Python 2, you should use the u'...' prefix to treat them as unicode strings. 普通字符串是Python 2中的字节,您应该使用u'...'前缀将其视为unicode字符串。

# -*- coding: utf-8 -*-
import re
def regex_check(name):
    pattern = u'[^ß]'    #use u'...' here  
    if re.match(pattern, name , re.IGNORECASE):
        return True
    else:
        return False

print regex_check(u"ø")  #use u'...' here

output: 输出:

True

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM