简体   繁体   中英

Removing everything except words, digits and spaces using python regex

Using Regex to remove everything except words, digits and spaces.

This is the function I defined:

def remove(text):
   return re.sub(r'[^\w\d\s]', '', text)

Is there anything extra or something missed out

\\w actually catches all the alphabets ( [A-Za-z] ), numbers ( \\d ), and underscores _

Regx101 Demo for \\w+

So, better try this code (with a different Regex)

def remove(text):
   return re.sub(r'[^A-Za-z\d\s]+', '', text)

Tell me if its not working...

Your approach will work. For example:

 import re

 text = ' !"(/£hello world1!!!!%"& '

 def remove(text):
   return re.sub(r'[^\w\d\s]', '', text)

 print (remove(text))

Your output will be:

 >>> hello world1

See this example here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM