[英]Python Regex: replace multiple possibilities of substring
I want to remove the indicator like Fig 1.
in string caption
, where caption
may be:我想在字符串
caption
中删除Fig 1.
的指示器,其中caption
可能是:
# each line is one instance of caption
"Figure 1: Path of Reading Materials from the Web to a Student."
"FIGURE 1 - Travel CP-net"
"Figure 1 Interpretation as abduction, the big picture."
"Fig. 1. The feature vector components"
"Fig 1: IMAGACT Log-in Page"
"FIG 1 ; The effect of descriptive and interpretive information, and Inclination o f Fit"
...
I've tried caption = re.sub(r'figure 1: |fig. 1 |figure 1 -', '', caption, flags=re.IGNORECASE)
, but it looks messy: do I really need to list all the possibilities manually?我试过
caption = re.sub(r'figure 1: |fig. 1 |figure 1 -', '', caption, flags=re.IGNORECASE)
,但看起来很乱:我真的需要列出所有手动的可能性? Is there any element re code to match 'em all?是否有任何元素重新编码来匹配它们?
Thanks a bunch!非常感谢!
You might use an optional part to match ure
and use an optional character class to match the :
, .
您可以使用可选部分来匹配
ure
并使用可选字符 class 来匹配:
, .
, ;
,
;
or -
或
-
If you want to match other digits than 1, use \d+
如果要匹配 1 以外的其他数字,请使用
\d+
\bfig\.?(?:ure)? 1[^\S\r\n]*[:.;–-]?
\bfig
Match fig preceded by a word boundary \bfig
匹配前面有单词边界的 fig\.?
Match an optional dot(?:ure)?
Optionally match ure
ure
1
Match a space and 1
1
匹配一个空格和1
[^\S\r\n]*
Match 0+ occurrences of a whitespace char except newlines [^\S\r\n]*
匹配 0+ 次出现的空白字符,换行符除外[:.;–-]?
Optionally match any of the listed in the character classRegex demo |正则表达式演示| Python demo
Python 演示
Example code to also match the whitespace after the character class:示例代码也匹配字符 class 之后的空格:
caption = re.sub(r'\bfig\.?(?:ure)? 1[^\S\r\n]*[:.;–-]?[^\S\r\n]', '', caption, flags=re.IGNORECASE)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.