I am having some problems using regex sub
to remove numbers from strings. Input strings can look like:
"The Term' means 125 years commencing on and including 01 October 2015."
"125 years commencing on 25th December 1996"
"the term of 999 years from the 1st January 2011"
What I want to do is remove the number and the word 'years'
- I am also parsing the string for dates using DateFinder
, but DateFinder
interprets the number as a date - hence why I want to remove the number.
Any thoughts on the regex
expression to remove the number and the word 'years'
?
try this to remove numbers and word years
:
re.sub(r'\s+\d+|\s+years', '', text)
if for instance:
text="The Term' means 125 years commencing on and including 01 October 2015."
then the output will be:
"The Term' means commencing on and including October."
I think this does what you want:
import re
my_list = ["The Term' means 125 years commencing on and including 01 October 2015.",
"125 years commencing on 25th December 1996",
"the term of 999 years from the 1st January 2011",
]
for item in my_list:
new_item = re.sub("\d+\syears", "", item)
print(new_item)
results:
The Term' means commencing on and including 01 October 2015.
commencing on 25th December 1996
the term of from the 1st January 2011
Note, you will end up with some extra white space (maybe you want that)? But you could also add this to 'clean' that up:
new_item = re.sub("\s+", " ", new_item)
because I love regexes: new_item = re.sub("^\\s+|\\s+$", "", new_item)
new_item = new_item.strip()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.