简体   繁体   中英

Using regex in xml etree parsing

I need to parse xml file and find a values only starts with "123". How i can do this using this code below? It is possible to use regex inside this syntax?

import xml.etree.ElementTree as ET
parse = ET.parse('xml.xml')
print([ events.text for record in parse.findall('.configuration/system/') for events in record.findall('events')])

xml.xml

<rpc-reply>
 <configuration>
        <system>
            <preference>
                <events>123</events>
                <events>124</events>
                <events>1235</events>                    
            </preference>
        </system>
 </configuration>
</rpc-reply>

XPath predicate can do that much using built-in function starts-with() . But you need to use library that fully support XPath 1.0 such as lxml :

from lxml import etree as ET
raw = '''<rpc-reply>
 <configuration>
        <system>
            <preference>
                <events>123</events>
                <events>124</events>
                <events>1235</events>                    
            </preference>
        </system>
 </configuration>
</rpc-reply>'''
root = ET.fromstring(raw)
query = 'configuration/system/preference/events[starts-with(.,"123")]'
print([events.text for events in root.xpath(query)])

If you still want to use regex, lxml supports regex despite XPath 1.0 specification does not include regex (see: Regex in lxml for python ).

xml.etree only supports limited subset of XPath 1.0 expression, which does not include starts-with function (and definitely does not support regex). So you need to rely on python string function to check that:

....
query = 'configuration/system/preference/events'
print([events.text for events in root.findall(query) if events.text.startswith('123')])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM