简体   繁体   中英

How to use re.sub() to leave only letters a-z, A-Z, numbers 0-9 and spaces but not divide numbers?

message = 'Hello(/ how{can} wan\';t //opperate+32.5 u&# kj|'

I need to leave only letters az, AZ, numbers 0-9 and spaces, so I must get 'Hello how can wan t opperate 325 u kj' but when I use re.sub('[^\w\d]+', ' ', message) or re.sub('[^A-Za-z0-9]+', ' ', message) I get 'Hello how can wan t opperate 32 5 u kj' How can I get 325 as a number?

You can use

re.sub(r'(\d+(?:[,.]\d+)+)|[\W_]+', lambda x: x.group(1) if x.group(1) else ' ', message).strip()

See the Python demo online .

Details :

  • (\d+(?:[,.]\d+)+) - Capturing group 1: one or more digits followed with one or more occurrences of a . or , and one or more digits
  • | - or
  • [\W_]+ - any one or more non-alphanumeric chars.

If Group 1 matches, the replacement is Group 1 value, else, the replacement is a space. If there is a match at the start/end of the string, there may be a stray space left, hence, using strip() .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM