I have these strings:
[[:File:Example.jpg]]
[[:File:Example.jpg|this example]]
[[Media:Example.jpg]]
[[Georgia (U.S. state)|Georgia]]
[[Arkansas]]
[[Canada]]
[[Virginia]]
[[Image:Houstonia longifolia - Long Leaf Bluet 2.jpg|thumb|left]]
I want to extract with re
the strings that start with [[Image
or [[Media:
or [[:file:
To find the strings beginning with [[:File:
, you can use:
re.search(r"\[\[:File.*?]]", your_strings)
Same with [[Media:
and [[Image
:
re.search(r"\[\[Media:.*?]]", your_strings)
re.search(r"\[\[Image.*?]]", your_strings)
See this example .
Try this Regex
Outputs only if there is [[Image
, [[Media:
or [[:File:
at the beginning of the string (also added re.IGNORECASE
flag to give the match at any case)
\\[\\[(?:Image|Media|:File):.+]]
Code:
import re
a = '''[[:File:Example.jpg]]
[[:File:Example.jpg|this example]]
[[Media:Example.jpg]]
[[Georgia (U.S. state)|Georgia]]
[[Arkansas]]
[[Canada]]
[[Virginia]]
[[Image:Houstonia longifolia - Long Leaf Bluet 2.jpg|thumb|left]]'''
print(re.findall(r'\[\[(?:Image|Media|:File):.+]]', a, flags=re.IGNORECASE))
Outputs:
[
'[[:File:Example.jpg]]',
'[[:File:Example.jpg|this example]]',
'[[Media:Example.jpg]]',
'[[Image:Houstonia longifolia - Long Leaf Bluet 2.jpg|thumb|left]]'
]
Tell me if its not working...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.