简体   繁体   中英

Python re.findall organize list

I have a text file with entries like this:

 <soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <soap:Body> <Applications_GetResponse xmlns="http://www.country.com"> <Applications> <CS_Application> <Name>Spain</Name> <Key>2345364564</Key> <Status>NORMAL</Status> <Modules> <CS_Module> <Name>zaragoza</Name> <Key>8743249725</Key> <DevelopmentEffort>0</DevelopmentEffort> <LogicalDBConnections/> </CS_Module> <CS_Module> <Name>malaga</Name> <Key>8743249725</Key> <DevelopmentEffort>0</DevelopmentEffort> <LogicalDBConnections/> </CS_Module> </Modules> <CreatedBy>7</CreatedBy> </CS_Application> <CS_Application> <Name>UK</Name> <Key>2345364564</Key> <Status>NORMAL</Status> <Modules> <CS_Module> <Name>london</Name> <Key>8743249725</Key> <DevelopmentEffort>0</DevelopmentEffort> <LogicalDBConnections/> </CS_Module> <CS_Module> <Name>liverpool</Name> <Key>8743249725</Key> <DevelopmentEffort>0</DevelopmentEffort> <LogicalDBConnections/> </CS_Module> </Modules> <CreatedBy>7</CreatedBy> </CS_Application> </Applications> </Applications_GetResponse> </soap:Body> </soap:Envelope>

I would like to analyze it and obtain the name of the country in the sequence of the cities.

I tried some things with python re.finall, but I didn't get anything like it

print("HERE APPLICATIONS")
applications = re.findall('<CS_Application><Name>(.*?)</Name>', response_apply.text)
print(applications)
print("HERE MODULES")
modules = re.findall('<CS_Module><Name>(.*?)</Name>', response_apply.text)
print(modules)

return:

host-10$ sudo python3 capture.py 
HERE APPLICATIONS
['Spain', 'UK']
HERE MODULES
['zaragoza', 'malaga', 'london', 'liverpool']

The expected result is, I would like the result to be like this:

HERE
The Country: Spain - Cities: zaragoza,malaga
The Country: UK - Cities: london,liverpool

Regex is not good to parse xml. Better use xml parser.. If you want regex solution then hope below code help you.

import re

s = """\n<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">\n   <soap:Body>\n      <Applications_GetResponse xmlns="http://www.country.com">\n         <Applications>\n            <CS_Application>\n               <Name>Spain</Name>\n               <Key>2345364564</Key>\n               <Status>NORMAL</Status>\n               <Modules>\n                  <CS_Module>\n                     <Name>zaragoza</Name>\n                     <Key>8743249725</Key>\n                     <DevelopmentEffort>0</DevelopmentEffort>\n                     <LogicalDBConnections/>\n                  </CS_Module>\n                  <CS_Module>\n                     <Name>malaga</Name>\n                     <Key>8743249725</Key>\n                     <DevelopmentEffort>0</DevelopmentEffort>\n                     <LogicalDBConnections/>\n                  </CS_Module>\n               </Modules>\n               <CreatedBy>7</CreatedBy>\n            </CS_Application>\n            <CS_Application>\n               <Name>UK</Name>\n               <Key>2345364564</Key>\n               <Status>NORMAL</Status>\n               <Modules>\n                  <CS_Module>\n                     <Name>london</Name>\n                     <Key>8743249725</Key>\n                     <DevelopmentEffort>0</DevelopmentEffort>\n                     <LogicalDBConnections/>\n                  </CS_Module>\n                  <CS_Module>\n                     <Name>liverpool</Name>\n                     <Key>8743249725</Key>\n                     <DevelopmentEffort>0</DevelopmentEffort>\n                     <LogicalDBConnections/>\n                  </CS_Module>\n               </Modules>\n               <CreatedBy>7</CreatedBy>\n            </CS_Application>\n        </Applications>\n      </Applications_GetResponse>\n   </soap:Body>\n</soap:Envelope>\n"""
pattern1 = re.compile(r'<CS_Application>([\s\S]*?)</CS_Application>')
pattern2 = re.compile(r'<Name>(.*)?</Name>')

for m in re.finditer(pattern1, s):
    ss = m.group(1)
    res = []
    for mm in re.finditer(pattern2, ss):
        res.append(mm.group(1))
    print("The Country: "+res[0]+" - Cities: "+",".join(res[1:len(res)]))

I have a text file with entries like this:

 <soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <soap:Body> <Applications_GetResponse xmlns="http://www.country.com"> <Applications> <CS_Application> <Name>Spain</Name> <Key>2345364564</Key> <Status>NORMAL</Status> <Modules> <CS_Module> <Name>zaragoza</Name> <Key>8743249725</Key> <DevelopmentEffort>0</DevelopmentEffort> <LogicalDBConnections/> </CS_Module> <CS_Module> <Name>malaga</Name> <Key>8743249725</Key> <DevelopmentEffort>0</DevelopmentEffort> <LogicalDBConnections/> </CS_Module> </Modules> <CreatedBy>7</CreatedBy> </CS_Application> <CS_Application> <Name>UK</Name> <Key>2345364564</Key> <Status>NORMAL</Status> <Modules> <CS_Module> <Name>london</Name> <Key>8743249725</Key> <DevelopmentEffort>0</DevelopmentEffort> <LogicalDBConnections/> </CS_Module> <CS_Module> <Name>liverpool</Name> <Key>8743249725</Key> <DevelopmentEffort>0</DevelopmentEffort> <LogicalDBConnections/> </CS_Module> </Modules> <CreatedBy>7</CreatedBy> </CS_Application> </Applications> </Applications_GetResponse> </soap:Body> </soap:Envelope>

I would like to analyze it and obtain the name of the country in the sequence of the cities.

I tried some things with python re.finall, but I didn't get anything like it

print("HERE APPLICATIONS")
applications = re.findall('<CS_Application><Name>(.*?)</Name>', response_apply.text)
print(applications)
print("HERE MODULES")
modules = re.findall('<CS_Module><Name>(.*?)</Name>', response_apply.text)
print(modules)

return:

host-10$ sudo python3 capture.py 
HERE APPLICATIONS
['Spain', 'UK']
HERE MODULES
['zaragoza', 'malaga', 'london', 'liverpool']

The expected result is, I would like the result to be like this:

HERE
The Country: Spain - Cities: zaragoza,malaga
The Country: UK - Cities: london,liverpool

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM