I have the following text:
Filling a gap December 6, 2018 Slide 6 Small parts example. Padded details May 22, 2020 Slide 21 Adds to safety
I need to replace the date + slide by .
(dot) to get the following result:
Filling a gap. Small parts example. Padded details. Adds to safety
Probably the mask can be used to identify the text to be removed:
{month} {day}, {year} {Slide} {slide number}
I can remove month using regex as follows:
(Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)
But how can I define the mask and put everything together? Not sure if regex is a proper solution or it is overkill.
Match the days from 1 - 31 to make it a bit more specific and Slide followed by 1 or more digits.
If you match the spaces before and after, and you replace with a dot and a single space, you will omit the double space gap.
Replace with .
\s*(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?) \b(?:[1-9]|[12]\d|3[01])\b,\s+\d{4} Slide \d+\s*
import re
pattern=r"\s*(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?) \b(?:[1-9]|[12]\d|3[01])\b,\s+\d{4} Slide \d+\s*"
s="Filling a gap December 6, 2018 Slide 6 Small parts example. Padded details May 22, 2020 Slide 21 Adds to safety"
print(re.sub(pattern, ". ", s))
Output
Filling a gap. Small parts example. Padded details. Adds to safety
Try this
(?:\b\d{1,2}\D{0,3})?\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(?:Nov|Dec)(?:ember)?)\D?(?:\d{1,2}\D?)?\D?(?:(?:19[7-9]\d|20\d{2})|\d{2}) Slide \d+
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.