简体   繁体   中英

How to remove digits from a word, except if the digit stands alone. Using Regex in Python

THIS is what I have thus far:

match = re.sub(r'[0-9]',"","th1s n33ds to be r3m0v3d and this 2 doesnt") 

This now will remove ALL the numbers throughout the sentence, I tried everything. Does anyone have an idea around this?

Much appreciated

You can use \\B :

>>> re.sub(r'\B[0-9]+\B',"","th1s n33ds to be r3m0v3d and this 2 doesnt")
ths nds to be rmvd and this 2 doesnt

Translation from regex into english: remove all digits sequences that are located inside of the word.

\\B - Matches the empty string, but only when it is not at the beginning or end of a word.

EDIT: if digits can start or end the word then this regex will do:

>>> re.sub(r'([0-9]+(?=[a-z])|(?<=[a-z])[0-9]+)',"","1th1s n33ds to be r3m0v3d and this 2 doesnt3")
ths nds to be rmvd and this 2 doesnt

Translation from regex into english: remove all digits that are followed or preceded by a letter. This second regex is pretty ugly and I'm sure there is a better way.

This works -

re.sub(r'(?:[a-zA-Z]*[0-9]+[a-zA-Z]+)|(?:[a-zA-Z]+[0-9]+[a-zA-Z]*)',"","th1s n33ds to be r3m0v3d and this 2 doesnt this2")
# output 
'  to be  and this 2 doesnt '

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM