Extract information from XML

Question

I'm using etree module. I'm trying to extract the information around <text ...> tag. Here is my XML file . I want if <text ...">{{Infobox film start with Infobox film then copy all the text between {{ }} . Is it possible? thanks

Update: XML file updated

Answer 1

The following snippet should do what you want:

import re
from xml.etree import ElementTree                                               

with open('films.xml') as f:                                                    
    xml = ElementTree.parse(f)                                                  

for t in xml.findall('.//{http://www.mediawiki.org/xml/export-0.5/}text'):
    print '===================='
    m = re.search(r'(?s).*?{{(Infobox film.*?)}}', t.text)
    if m:
        print m.group(1)

The regular expression there begins with (?s) , which turns on the DOTALL option, meaning that . matches newlines as well as any other character. The two instances of .*? are non-greedy matches of any charcter - ie they will find the shortest stretch of zero or more characters until the rest of the expression can be matched.

Extract information from XML

Question

1 answers

solution1
2 ACCPTED 2011-10-20 11:13:27

Extract information from XML

Question

1 answers

solution1 2 ACCPTED 2011-10-20 11:13:27

solution1
2 ACCPTED 2011-10-20 11:13:27