简体   繁体   中英

Using regex in python to find a string

I'm trying to find a substring of a string s, starting with {{Infobox and ending with }} . I tried doing this with a regular expression, but it doesn't get any results. I think the fault is in my regular expression, but since I'm quitte new to regex, I hope someone can help with this. String s is for example:

s = '{{blabla}}{{Infobox persoon Tweede Wereldoorlog| naam=Albert Speer| afbeelding=Albert Speer Neurenberg.JPG}}{{blabla}}'

result = re.search('(.*)\{\{Infobox (.*)\}\}(.*)', s)
if result:
    print(result.group(2))

You can use lazy dot matching since your delimiters are not one-symbol delimiters, and capture what you need into group 1:

import re
p = re.compile(r'\{\{Infobox\s*(.*?)}}')
test_str = "{{blabla}}{{Infobox persoon Tweede Wereldoorlog| naam=Albert Speer| afbeelding=Albert Speer Neurenberg.JPG}}{{blabla}}"
match = p.search(test_str)
if match:
    print(match.group(1))

See IDEONE demo

If you use a negated character class, any { or } inside the Infobox will prevent from matching the whole substring.

Also, since you do not seem to need the substrings before and after the substring you need, you do not need to match (or capture) them at all (thus, I removed them).

Code:

import re
s = '{{blabla}}{{Infobox persoon Tweede Wereldoorlog| naam=Albert Speer| afbeelding=Albert Speer Neurenberg.JPG}}{{blabla}}'

result = re.search(r'(.*){{Infobox ([^}]*?)}}(.*)', s)
if result:
    print(result.group(2))

Output:

persoon Tweede Wereldoorlog| naam=Albert Speer| afbeelding=Albert Speer Neurenberg.JPG

NOTE : The above regex will match till it meets the first } after {{Infobox .

Important note:

This will only work for the cases like the given sample input

It will not work if the input has a } in between ie){{blabla}}{{Infobox persoon Tweede Wereldoorlog| naam=Albert Speer| }afbeelding=Albert Speer Neurenberg.JPG}}{{blabla}} ie){{blabla}}{{Infobox persoon Tweede Wereldoorlog| naam=Albert Speer| }afbeelding=Albert Speer Neurenberg.JPG}}{{blabla}} ie){{blabla}}{{Infobox persoon Tweede Wereldoorlog| naam=Albert Speer| }afbeelding=Albert Speer Neurenberg.JPG}}{{blabla}} For cases like that stribizhev's answer is the best solution

s = '{{blabla}}{{Infobox persoon Tweede Wereldoorlog| naam=Albert Speer| afbeelding=Albert Speer Neurenberg.JPG}}{{blabla}}'

# start with Infobox and two chars before, grab everything but '}', followed by two chars
mo = re.search(r'(..Infobox[^}]*..)',s)


print(mo.group(1))


# {{Infobox persoon Tweede Wereldoorlog| naam=Albert Speer| afbeelding=Albert Speer Neurenberg.JPG}}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM