简体   繁体   中英

Remove date sub-string based on a mask

I have the following text:

Filling a gap December 6, 2018 Slide 6 Small parts example. Padded details May 22, 2020 Slide 21 Adds to safety

I need to replace the date + slide by . (dot) to get the following result:

Filling a gap. Small parts example. Padded details. Adds to safety

Probably the mask can be used to identify the text to be removed:

{month} {day}, {year} {Slide} {slide number}

I can remove month using regex as follows:

(Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)

But how can I define the mask and put everything together? Not sure if regex is a proper solution or it is overkill.

Match the days from 1 - 31 to make it a bit more specific and Slide followed by 1 or more digits.

If you match the spaces before and after, and you replace with a dot and a single space, you will omit the double space gap.

Replace with .

\s*(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?) \b(?:[1-9]|[12]\d|3[01])\b,\s+\d{4} Slide \d+\s*

Regex demo

import re

pattern=r"\s*(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?) \b(?:[1-9]|[12]\d|3[01])\b,\s+\d{4} Slide \d+\s*"
s="Filling a gap December 6, 2018 Slide 6 Small parts example. Padded details May 22, 2020 Slide 21 Adds to safety"
print(re.sub(pattern, ". ", s))

Output

Filling a gap. Small parts example. Padded details. Adds to safety

Try this

(?:\b\d{1,2}\D{0,3})?\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(?:Nov|Dec)(?:ember)?)\D?(?:\d{1,2}\D?)?\D?(?:(?:19[7-9]\d|20\d{2})|\d{2}) Slide \d+

Demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM