I have a text file with entries like this:
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <soap:Body> <Applications_GetResponse xmlns="http://www.country.com"> <Applications> <CS_Application> <Name>Spain</Name> <Key>2345364564</Key> <Status>NORMAL</Status> <Modules> <CS_Module> <Name>zaragoza</Name> <Key>8743249725</Key> <DevelopmentEffort>0</DevelopmentEffort> <LogicalDBConnections/> </CS_Module> <CS_Module> <Name>malaga</Name> <Key>8743249725</Key> <DevelopmentEffort>0</DevelopmentEffort> <LogicalDBConnections/> </CS_Module> </Modules> <CreatedBy>7</CreatedBy> </CS_Application> <CS_Application> <Name>UK</Name> <Key>2345364564</Key> <Status>NORMAL</Status> <Modules> <CS_Module> <Name>london</Name> <Key>8743249725</Key> <DevelopmentEffort>0</DevelopmentEffort> <LogicalDBConnections/> </CS_Module> <CS_Module> <Name>liverpool</Name> <Key>8743249725</Key> <DevelopmentEffort>0</DevelopmentEffort> <LogicalDBConnections/> </CS_Module> </Modules> <CreatedBy>7</CreatedBy> </CS_Application> </Applications> </Applications_GetResponse> </soap:Body> </soap:Envelope>
I would like to analyze it and obtain the name of the country in the sequence of the cities.
I tried some things with python re.finall, but I didn't get anything like it
print("HERE APPLICATIONS")
applications = re.findall('<CS_Application><Name>(.*?)</Name>', response_apply.text)
print(applications)
print("HERE MODULES")
modules = re.findall('<CS_Module><Name>(.*?)</Name>', response_apply.text)
print(modules)
return:
host-10$ sudo python3 capture.py
HERE APPLICATIONS
['Spain', 'UK']
HERE MODULES
['zaragoza', 'malaga', 'london', 'liverpool']
The expected result is, I would like the result to be like this:
HERE
The Country: Spain - Cities: zaragoza,malaga
The Country: UK - Cities: london,liverpool
Regex is not good to parse xml. Better use xml parser.. If you want regex solution then hope below code help you.
import re
s = """\n<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">\n <soap:Body>\n <Applications_GetResponse xmlns="http://www.country.com">\n <Applications>\n <CS_Application>\n <Name>Spain</Name>\n <Key>2345364564</Key>\n <Status>NORMAL</Status>\n <Modules>\n <CS_Module>\n <Name>zaragoza</Name>\n <Key>8743249725</Key>\n <DevelopmentEffort>0</DevelopmentEffort>\n <LogicalDBConnections/>\n </CS_Module>\n <CS_Module>\n <Name>malaga</Name>\n <Key>8743249725</Key>\n <DevelopmentEffort>0</DevelopmentEffort>\n <LogicalDBConnections/>\n </CS_Module>\n </Modules>\n <CreatedBy>7</CreatedBy>\n </CS_Application>\n <CS_Application>\n <Name>UK</Name>\n <Key>2345364564</Key>\n <Status>NORMAL</Status>\n <Modules>\n <CS_Module>\n <Name>london</Name>\n <Key>8743249725</Key>\n <DevelopmentEffort>0</DevelopmentEffort>\n <LogicalDBConnections/>\n </CS_Module>\n <CS_Module>\n <Name>liverpool</Name>\n <Key>8743249725</Key>\n <DevelopmentEffort>0</DevelopmentEffort>\n <LogicalDBConnections/>\n </CS_Module>\n </Modules>\n <CreatedBy>7</CreatedBy>\n </CS_Application>\n </Applications>\n </Applications_GetResponse>\n </soap:Body>\n</soap:Envelope>\n"""
pattern1 = re.compile(r'<CS_Application>([\s\S]*?)</CS_Application>')
pattern2 = re.compile(r'<Name>(.*)?</Name>')
for m in re.finditer(pattern1, s):
ss = m.group(1)
res = []
for mm in re.finditer(pattern2, ss):
res.append(mm.group(1))
print("The Country: "+res[0]+" - Cities: "+",".join(res[1:len(res)]))
I have a text file with entries like this:
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <soap:Body> <Applications_GetResponse xmlns="http://www.country.com"> <Applications> <CS_Application> <Name>Spain</Name> <Key>2345364564</Key> <Status>NORMAL</Status> <Modules> <CS_Module> <Name>zaragoza</Name> <Key>8743249725</Key> <DevelopmentEffort>0</DevelopmentEffort> <LogicalDBConnections/> </CS_Module> <CS_Module> <Name>malaga</Name> <Key>8743249725</Key> <DevelopmentEffort>0</DevelopmentEffort> <LogicalDBConnections/> </CS_Module> </Modules> <CreatedBy>7</CreatedBy> </CS_Application> <CS_Application> <Name>UK</Name> <Key>2345364564</Key> <Status>NORMAL</Status> <Modules> <CS_Module> <Name>london</Name> <Key>8743249725</Key> <DevelopmentEffort>0</DevelopmentEffort> <LogicalDBConnections/> </CS_Module> <CS_Module> <Name>liverpool</Name> <Key>8743249725</Key> <DevelopmentEffort>0</DevelopmentEffort> <LogicalDBConnections/> </CS_Module> </Modules> <CreatedBy>7</CreatedBy> </CS_Application> </Applications> </Applications_GetResponse> </soap:Body> </soap:Envelope>
I would like to analyze it and obtain the name of the country in the sequence of the cities.
I tried some things with python re.finall, but I didn't get anything like it
print("HERE APPLICATIONS")
applications = re.findall('<CS_Application><Name>(.*?)</Name>', response_apply.text)
print(applications)
print("HERE MODULES")
modules = re.findall('<CS_Module><Name>(.*?)</Name>', response_apply.text)
print(modules)
return:
host-10$ sudo python3 capture.py
HERE APPLICATIONS
['Spain', 'UK']
HERE MODULES
['zaragoza', 'malaga', 'london', 'liverpool']
The expected result is, I would like the result to be like this:
HERE
The Country: Spain - Cities: zaragoza,malaga
The Country: UK - Cities: london,liverpool
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.