简体   繁体   中英

Python Regex - using re.sub to clean up a string

I am having some problems using regex sub to remove numbers from strings. Input strings can look like:

"The Term' means 125 years commencing on and including 01 October 2015."

"125 years commencing on 25th December 1996"

"the term of 999 years from the 1st January 2011"

What I want to do is remove the number and the word 'years' - I am also parsing the string for dates using DateFinder , but DateFinder interprets the number as a date - hence why I want to remove the number.

Any thoughts on the regex expression to remove the number and the word 'years' ?

try this to remove numbers and word years :

re.sub(r'\s+\d+|\s+years', '', text)

if for instance:

text="The Term' means 125 years commencing on and including 01 October 2015."

then the output will be:

"The Term' means commencing on and including October."

I think this does what you want:

import re

my_list = ["The Term' means 125 years commencing on and including 01 October 2015.",
"125 years commencing on 25th December 1996",
"the term of 999 years from the 1st January 2011",
]

for item in my_list:
    new_item = re.sub("\d+\syears", "", item)
    print(new_item)

results:

The Term' means  commencing on and including 01 October 2015.
 commencing on 25th December 1996
the term of  from the 1st January 2011

Note, you will end up with some extra white space (maybe you want that)? But you could also add this to 'clean' that up:

new_item = re.sub("\s+", " ", new_item)

because I love regexes: new_item = re.sub("^\\s+|\\s+$", "", new_item)

new_item = new_item.strip()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM