简体   繁体   English

Python re.findall 组织列表

[英]Python re.findall organize list

I have a text file with entries like this:我有一个包含如下条目的文本文件:

 <soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <soap:Body> <Applications_GetResponse xmlns="http://www.country.com"> <Applications> <CS_Application> <Name>Spain</Name> <Key>2345364564</Key> <Status>NORMAL</Status> <Modules> <CS_Module> <Name>zaragoza</Name> <Key>8743249725</Key> <DevelopmentEffort>0</DevelopmentEffort> <LogicalDBConnections/> </CS_Module> <CS_Module> <Name>malaga</Name> <Key>8743249725</Key> <DevelopmentEffort>0</DevelopmentEffort> <LogicalDBConnections/> </CS_Module> </Modules> <CreatedBy>7</CreatedBy> </CS_Application> <CS_Application> <Name>UK</Name> <Key>2345364564</Key> <Status>NORMAL</Status> <Modules> <CS_Module> <Name>london</Name> <Key>8743249725</Key> <DevelopmentEffort>0</DevelopmentEffort> <LogicalDBConnections/> </CS_Module> <CS_Module> <Name>liverpool</Name> <Key>8743249725</Key> <DevelopmentEffort>0</DevelopmentEffort> <LogicalDBConnections/> </CS_Module> </Modules> <CreatedBy>7</CreatedBy> </CS_Application> </Applications> </Applications_GetResponse> </soap:Body> </soap:Envelope>

I would like to analyze it and obtain the name of the country in the sequence of the cities.我想分析它并获得城市序列中的国家名称。

I tried some things with python re.finall, but I didn't get anything like it我用 python re.finall 尝试了一些东西,但我没有得到类似的东西

print("HERE APPLICATIONS")
applications = re.findall('<CS_Application><Name>(.*?)</Name>', response_apply.text)
print(applications)
print("HERE MODULES")
modules = re.findall('<CS_Module><Name>(.*?)</Name>', response_apply.text)
print(modules)

return:返回:

host-10$ sudo python3 capture.py 
HERE APPLICATIONS
['Spain', 'UK']
HERE MODULES
['zaragoza', 'malaga', 'london', 'liverpool']

The expected result is, I would like the result to be like this:预期的结果是,我希望结果是这样的:

HERE
The Country: Spain - Cities: zaragoza,malaga
The Country: UK - Cities: london,liverpool

Regex is not good to parse xml. Better use xml parser.. If you want regex solution then hope below code help you.正则表达式不好解析 xml。最好使用 xml 解析器。如果你想要正则表达式解决方案,那么希望下面的代码对你有所帮助。

import re

s = """\n<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">\n   <soap:Body>\n      <Applications_GetResponse xmlns="http://www.country.com">\n         <Applications>\n            <CS_Application>\n               <Name>Spain</Name>\n               <Key>2345364564</Key>\n               <Status>NORMAL</Status>\n               <Modules>\n                  <CS_Module>\n                     <Name>zaragoza</Name>\n                     <Key>8743249725</Key>\n                     <DevelopmentEffort>0</DevelopmentEffort>\n                     <LogicalDBConnections/>\n                  </CS_Module>\n                  <CS_Module>\n                     <Name>malaga</Name>\n                     <Key>8743249725</Key>\n                     <DevelopmentEffort>0</DevelopmentEffort>\n                     <LogicalDBConnections/>\n                  </CS_Module>\n               </Modules>\n               <CreatedBy>7</CreatedBy>\n            </CS_Application>\n            <CS_Application>\n               <Name>UK</Name>\n               <Key>2345364564</Key>\n               <Status>NORMAL</Status>\n               <Modules>\n                  <CS_Module>\n                     <Name>london</Name>\n                     <Key>8743249725</Key>\n                     <DevelopmentEffort>0</DevelopmentEffort>\n                     <LogicalDBConnections/>\n                  </CS_Module>\n                  <CS_Module>\n                     <Name>liverpool</Name>\n                     <Key>8743249725</Key>\n                     <DevelopmentEffort>0</DevelopmentEffort>\n                     <LogicalDBConnections/>\n                  </CS_Module>\n               </Modules>\n               <CreatedBy>7</CreatedBy>\n            </CS_Application>\n        </Applications>\n      </Applications_GetResponse>\n   </soap:Body>\n</soap:Envelope>\n"""
pattern1 = re.compile(r'<CS_Application>([\s\S]*?)</CS_Application>')
pattern2 = re.compile(r'<Name>(.*)?</Name>')

for m in re.finditer(pattern1, s):
    ss = m.group(1)
    res = []
    for mm in re.finditer(pattern2, ss):
        res.append(mm.group(1))
    print("The Country: "+res[0]+" - Cities: "+",".join(res[1:len(res)]))

I have a text file with entries like this:我有一个包含如下条目的文本文件:

 <soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <soap:Body> <Applications_GetResponse xmlns="http://www.country.com"> <Applications> <CS_Application> <Name>Spain</Name> <Key>2345364564</Key> <Status>NORMAL</Status> <Modules> <CS_Module> <Name>zaragoza</Name> <Key>8743249725</Key> <DevelopmentEffort>0</DevelopmentEffort> <LogicalDBConnections/> </CS_Module> <CS_Module> <Name>malaga</Name> <Key>8743249725</Key> <DevelopmentEffort>0</DevelopmentEffort> <LogicalDBConnections/> </CS_Module> </Modules> <CreatedBy>7</CreatedBy> </CS_Application> <CS_Application> <Name>UK</Name> <Key>2345364564</Key> <Status>NORMAL</Status> <Modules> <CS_Module> <Name>london</Name> <Key>8743249725</Key> <DevelopmentEffort>0</DevelopmentEffort> <LogicalDBConnections/> </CS_Module> <CS_Module> <Name>liverpool</Name> <Key>8743249725</Key> <DevelopmentEffort>0</DevelopmentEffort> <LogicalDBConnections/> </CS_Module> </Modules> <CreatedBy>7</CreatedBy> </CS_Application> </Applications> </Applications_GetResponse> </soap:Body> </soap:Envelope>

I would like to analyze it and obtain the name of the country in the sequence of the cities.我想对其进行分析并按城市的顺序获得国家的名称。

I tried some things with python re.finall, but I didn't get anything like it我用 python re.finall 尝试了一些东西,但我没有得到类似的东西

print("HERE APPLICATIONS")
applications = re.findall('<CS_Application><Name>(.*?)</Name>', response_apply.text)
print(applications)
print("HERE MODULES")
modules = re.findall('<CS_Module><Name>(.*?)</Name>', response_apply.text)
print(modules)

return:返回:

host-10$ sudo python3 capture.py 
HERE APPLICATIONS
['Spain', 'UK']
HERE MODULES
['zaragoza', 'malaga', 'london', 'liverpool']

The expected result is, I would like the result to be like this:预期的结果是,我希望结果是这样的:

HERE
The Country: Spain - Cities: zaragoza,malaga
The Country: UK - Cities: london,liverpool

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM