简体   繁体   中英

python regular expression : How can I filter only special characters?

I want to check either given words contain special character or not.
so below is my python code

The literal 'a@bcd' has '@', so it will be matchd and it's ok.
but 'a1bcd' has no special character. but it was filtered too!!

import re
regexp = re.compile('[~`!@#$%^&*()-_=+\[\]{}\\|;:\'\",.<>/?]+')

if regexp.search('a@bcd') :
    print 'matched!! nich catch!!'

if regexp.search('a1bcd') :
    print 'something is wrong here!!!'

result : python ../special_char.py matched!! nich catch!! something is wrong here!!!

I have no idea why it works like above..someone help me..T_T;;; thanks~

Move the dash in you regular expression to the start of the [] group, like this:

regexp = re.compile('[-~`!@#$%^&*()_=+\[\]{}\\|;:\'\",.<>/?]+')

Where you had the dash, it was read with the surrounding characters as )-_ and since it is inside [] it is interpreted as asking to match a range from ) to _ . If you move the dash to just after the [ it has no special meaning and instead matches itself.

Here's an interactive session showing the specific problem there was in your regular expression:

>>> import re
>>> print re.search('[)-_]', 'abcd')
None
>>> print re.search('[)-_]', 'a1b')
<_sre.SRE_Match object at 0x7f71082247e8>
>>> print re.search('[)-_]', 'a1b').group(0)
1

After fixing it:

>>> print re.search('[-)_]', 'a1b')
None

Unless there's some reason not visible in your question, I'd also say that the final + is not needed.

re will be relatively slow for this

I'd suggest trying

specialchars = '''-~`!@#$%^&*()_=+[]{}\\|;:'",.<>/?'''
len(word) != len(word.translate(None, specialchars))

or

set(word) & set(specialchars)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM