I want to remove the indicator like Fig 1.
in string caption
, where caption
may be:
# each line is one instance of caption
"Figure 1: Path of Reading Materials from the Web to a Student."
"FIGURE 1 - Travel CP-net"
"Figure 1 Interpretation as abduction, the big picture."
"Fig. 1. The feature vector components"
"Fig 1: IMAGACT Log-in Page"
"FIG 1 ; The effect of descriptive and interpretive information, and Inclination o f Fit"
...
I've tried caption = re.sub(r'figure 1: |fig. 1 |figure 1 -', '', caption, flags=re.IGNORECASE)
, but it looks messy: do I really need to list all the possibilities manually? Is there any element re code to match 'em all?
Thanks a bunch!
You might use an optional part to match ure
and use an optional character class to match the :
, .
, ;
or -
If you want to match other digits than 1, use \d+
\bfig\.?(?:ure)? 1[^\S\r\n]*[:.;–-]?
\bfig
Match fig preceded by a word boundary \.?
Match an optional dot(?:ure)?
Optionally match ure
1
Match a space and 1
[^\S\r\n]*
Match 0+ occurrences of a whitespace char except newlines [:.;–-]?
Optionally match any of the listed in the character classExample code to also match the whitespace after the character class:
caption = re.sub(r'\bfig\.?(?:ure)? 1[^\S\r\n]*[:.;–-]?[^\S\r\n]', '', caption, flags=re.IGNORECASE)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.