繁体   English   中英

基于掩码删除日期子字符串

[英]Remove date sub-string based on a mask

我有以下文字:

Filling a gap December 6, 2018 Slide 6 Small parts example. Padded details May 22, 2020 Slide 21 Adds to safety

我需要将日期 + 幻灯片替换为. (点)得到以下结果:

Filling a gap. Small parts example. Padded details. Adds to safety

可能掩码可用于识别要删除的文本:

{month} {day}, {year} {Slide} {slide number}

我可以使用正则表达式删除月份,如下所示:

(Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)

但是我如何定义掩码并将所有内容放在一起? 不确定正则表达式是一个合适的解决方案还是矫枉过正。

匹配从 1 到 31 的天数以使其更加具体,然后 Slide 后跟 1 位或更多位数字。

如果匹配前后空格,并用点和单个空格替换,则将省略双空格间隙。

替换为.

\s*(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?) \b(?:[1-9]|[12]\d|3[01])\b,\s+\d{4} Slide \d+\s*

正则表达式演示

import re

pattern=r"\s*(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?) \b(?:[1-9]|[12]\d|3[01])\b,\s+\d{4} Slide \d+\s*"
s="Filling a gap December 6, 2018 Slide 6 Small parts example. Padded details May 22, 2020 Slide 21 Adds to safety"
print(re.sub(pattern, ". ", s))

输出

Filling a gap. Small parts example. Padded details. Adds to safety

尝试这个

(?:\b\d{1,2}\D{0,3})?\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(?:Nov|Dec)(?:ember)?)\D?(?:\d{1,2}\D?)?\D?(?:(?:19[7-9]\d|20\d{2})|\d{2}) Slide \d+

演示

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM