Python 正则表达式：替换 substring 的多种可能性

Question

I want to remove the indicator like Fig 1. in string caption , where caption may be:我想在字符串caption中删除Fig 1.的指示器，其中caption可能是：

# each line is one instance of caption
"Figure 1: Path of Reading Materials from the Web to a Student."
"FIGURE 1 - Travel CP-net"
"Figure 1 Interpretation as abduction, the big picture."
"Fig. 1. The feature vector components"
"Fig 1: IMAGACT Log-in Page"
"FIG 1 ; The effect of descriptive and interpretive information, and Inclination o f Fit"
...

I've tried caption = re.sub(r'figure 1: |fig. 1 |figure 1 -', '', caption, flags=re.IGNORECASE) , but it looks messy: do I really need to list all the possibilities manually?我试过caption = re.sub(r'figure 1: |fig. 1 |figure 1 -', '', caption, flags=re.IGNORECASE) ，但看起来很乱：我真的需要列出所有手动的可能性？ Is there any element re code to match 'em all?是否有任何元素重新编码来匹配它们？

Thanks a bunch!非常感谢！

Answer 1

You might use an optional part to match ure and use an optional character class to match the : , .您可以使用可选部分来匹配ure并使用可选字符 class 来匹配: , . , ; , ; or -或-

If you want to match other digits than 1, use \d+如果要匹配 1 以外的其他数字，请使用\d+

\bfig\.?(?:ure)? 1[^\S\r\n]*[:.;–-]?

\bfig Match fig preceded by a word boundary \bfig匹配前面有单词边界的 fig
\.? Match an optional dot匹配一个可选的点
(?:ure)? Optionally match ure可选ure
1 Match a space and 1 1匹配一个空格和1
[^\S\r\n]* Match 0+ occurrences of a whitespace char except newlines [^\S\r\n]*匹配 0+ 次出现的空白字符，换行符除外
[:.;–-]? Optionally match any of the listed in the character class可选匹配字符 class 中列出的任何一个

Regex demo |正则表达式演示| Python demo Python 演示

Example code to also match the whitespace after the character class:示例代码也匹配字符 class 之后的空格：

caption = re.sub(r'\bfig\.?(?:ure)? 1[^\S\r\n]*[:.;–-]?[^\S\r\n]', '', caption, flags=re.IGNORECASE)

Python 正则表达式：替换 substring 的多种可能性

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-05-27 13:27:23

Python 正则表达式：替换 substring 的多种可能性

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-05-27 13:27:23

解决方案1
1 已采纳 2020-05-27 13:27:23