[英]Regular Expression - character class for special characters
I need to write a regular expression in Python that will capture some text which could possibly include any special character (like !@#$%^). 我需要在Python中编写一个正则表达式,以捕获可能包含任何特殊字符(如!@#$%^)的某些文本。 Is there a character class similar to [\\w] or [\\d] that will capture any special character?
是否有类似于[\\ w]或[\\ d]的字符类可以捕获任何特殊字符?
I could write down all the special characters in my regex but it would end up looking unreadable. 我可以在正则表达式中写下所有特殊字符,但最终看起来不可读。 Any help appreciated.
任何帮助表示赞赏。
If you're using Python3, you might not have to do anything. 如果您使用的是Python3,则可能无需执行任何操作。
\\w
already includes many "special characters" : \\w
已经包含许多“特殊字符”:
>>> import re
>>> re.findall('\w', 'üäößéÅßêèiìí')
['ü', 'ä', 'ö', 'ß', 'é', 'Å', 'ß', 'ê', 'è', 'i', 'ì', 'í']
In Python2.7, only i
would be matched by default \\w
: 在Python2.7中,默认情况下,只有
i
会被匹配\\w
:
>>> import re
>>> re.findall('\w', 'üäößéÅßêèiìí')
['i']
You could use re.UNICODE
: 您可以使用
re.UNICODE
:
# encoding: utf-8
import re
any_char = re.compile('\w', re.UNICODE)
re.findall(any_char, u'üäößéÅßêèiìí')
# [u'\xfc', u'\xe4', u'\xf6', u'\xdf', u'\xe9', u'\xc5', u'\xdf', u'\xea', u'\xe8', u'i', u'\xec', u'\xed']
for x in re.findall(any_char, u'üäößéÅßêèiìí'):
print x
# ü
# ä
# ö
# ß
# é
# Å
# ß
# ê
# è
# i
# ì
# í
Specifying unicode ranges might simplify your regex. 指定unicode范围可能会简化您的正则表达式。 As an example, this regex match any unicode arrow :
例如,此正则表达式匹配任何unicode箭头 :
>>> import re
>>> arrows = re.compile(r'[\u2190-\u21FF]')
>>> re.findall(arrows, "a⇸b⇙c↺d↣e↝f")
['⇸', '⇙', '↺', '↣', '↝']
For Python2, you'd need to specify unicode string and regex : 对于Python2,您需要指定unicode字符串和regex:
>>> import re
>>> arrows = re.compile(ur'[\u2190-\u21FF]')
>>> re.findall(arrows, u"a⇸b⇙c↺d↣e↝f")
[u'\u21f8', u'\u21d9', u'\u21ba', u'\u21a3', u'\u219d']
您可以尝试使用与任何非单词或非数字字符匹配的否定版本(\\ W,\\ D)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.