简体   繁体   English

python3解析XML

[英]python3 parsing XML

I have the following xml which contains email configuration for various email service providers, and I'm trying to parse these information into a dict; 我有以下xml,其中包含各种电子邮件服务提供商的电子邮件配置,并且我试图将这些信息解析为字典; hostname, is_ssl, port, protocol ..etc 主机名,is_ssl,端口,协议..etc

<domains>
  <domain>
    <name>zoznam.sk</name>
    <description>Zoznam Slovakia</description>
    <service>
      <hostname>imap.zoznam.sk</hostname>
      <port>143</port>
      <protocol>IMAP</protocol>
      <authentication>PLAIN</authentication>
      <usernameIncludesDomain/>
    </service>
    <service>
      <hostname>smtp.zoznam.sk</hostname>
      <port>587</port>
      <protocol>SMTP</protocol>
      <authentication>PLAIN</authentication>
      <usernameIncludesDomain/>
    </service>
  </domain>
    <domain>
    <name>123mail.org</name>
    <description>123mail.org</description>
    <service>
      <hostname>imap.fastmail.com</hostname>
      <port>993</port>
      <protocol>IMAP</protocol>
      <ssl/>
      <requires/>
      <authentication>PLAIN</authentication>
      <usernameIncludesDomain/>
    </service>
    <service>
      <hostname>smtp.fastmail.com</hostname>
      <port>587</port>
      <protocol>SMTP</protocol>
      <ssl/>
      <requires/>
      <authentication>PLAIN</authentication>
      <usernameIncludesDomain/>
    </service>
  </domain>
    <domain>
    <name>Netvigator.com</name>
    <description>netvigator.com</description>
    <service>
      <hostname>corpmail1.netvigator.com</hostname>
      <port>995</port>
      <protocol>POP</protocol>
      <ssl/>
      <authentication>NONE</authentication>
      <usernameIncludesDomain/>
    </service>
    <service>
      <hostname>corpmail1.netvigator.com</hostname>
      <port>587</port>
      <protocol>SMTP</protocol>
      <ssl/>
      <authentication>NONE</authentication>
      <usernameIncludesDomain/>
    </service>
  </domain>
</domains>

I tried to parse the name for testing but could not succeed, I'm need to python. 我试图解析名称以进行测试,但无法成功,我需要使用python。

import xml.etree.ElementTree as ET


configs_file = 'isp_list.xml'


def parseXML(xmlfile):
    # create element tree object
    tree = ET.parse(xmlfile)

    # get root element
    root = tree.getroot()

    # create empty list for configs items
    configs = []

    # iterate  items
    for item in root.findall('domains/domain'):
        value = item.get('name')

        # test
        print(value)

        # append news dictionary to items list
        configs.append(item)

    # return items list
    return configs

I appreciate your help. 我感谢您的帮助。 thank you. 谢谢。

You can still use bs4 to generate a dict. 您仍然可以使用bs4生成字典。

For the if else lines you could use the more compact syntax of eg 对于if else行,您可以使用更紧凑的语法,例如

'ssl' : getattr(item.find('ssl'), 'text', 'N/A')

Script: 脚本:

from bs4 import BeautifulSoup as bs
xml = '''
<domains>
  <domain>
    <name>zoznam.sk</name>
    <description>Zoznam Slovakia</description>
    <service>
      <hostname>imap.zoznam.sk</hostname>
      <port>143</port>
      <protocol>IMAP</protocol>
      <authentication>PLAIN</authentication>
      <usernameIncludesDomain/>
    </service>
    <service>
      <hostname>smtp.zoznam.sk</hostname>
      <port>587</port>
      <protocol>SMTP</protocol>
      <authentication>PLAIN</authentication>
      <usernameIncludesDomain/>
    </service>
  </domain>
    <domain>
    <name>123mail.org</name>
    <description>123mail.org</description>
    <service>
      <hostname>imap.fastmail.com</hostname>
      <port>993</port>
      <protocol>IMAP</protocol>
      <ssl/>
      <requires/>
      <authentication>PLAIN</authentication>
      <usernameIncludesDomain/>
    </service>
    <service>
      <hostname>smtp.fastmail.com</hostname>
      <port>587</port>
      <protocol>SMTP</protocol>
      <ssl/>
      <requires/>
      <authentication>PLAIN</authentication>
      <usernameIncludesDomain/>
    </service>
  </domain>
    <domain>
    <name>Netvigator.com</name>
    <description>netvigator.com</description>
    <service>
      <hostname>corpmail1.netvigator.com</hostname>
      <port>995</port>
      <protocol>POP</protocol>
      <ssl/>
      <authentication>NONE</authentication>
      <usernameIncludesDomain/>
    </service>
    <service>
      <hostname>corpmail1.netvigator.com</hostname>
      <port>587</port>
      <protocol>SMTP</protocol>
      <ssl/>
      <authentication>NONE</authentication>
      <usernameIncludesDomain/>
    </service>
  </domain>
</domains>

'''

data = {}
soup = bs(xml, 'lxml')

for domain in soup.select('domain'):
    name = domain.select_one('name').text
    data[name] = {
        'name' : name,
        'desc' : domain.select_one('description').text,  
        'services' : {}
    }
    i = 1
    for item in domain.select('service'):
        service = {
                    'hostname' : item.select_one('hostname').text  if item.select_one('hostname') else 'N/A', 
                    'port' : item.select_one('port').text if item.select_one('port') else 'N/A',
                    'protocol' : item.select_one('protocol').text if item.select_one('protocol').text else 'N/A',
                    'ssl' : item.select_one('ssl').text if item.select_one('ssl') else 'N/A',
                    'requires' : item.select_one('requires  \: ').text if item.select_one('requires  \: ') else 'N/A',
                    'authentication' : item.select_one('authentication').text if item.select_one('authentication') else 'N/A',
                    'usernameincludesdomain' : item.select_one('usernameincludesdomain').text if  item.select_one('usernameincludesdomain') else 'N/A'
        }
        data[name]['services'][str(i)] = service
        i+=1
print(data)

在此处输入图片说明

view the structure here 这里查看结构



If you are literally converting xml to a json like structure maybe a library like untangle would work? 如果您将xml转换为类似json的结构,那么像untangle这样的库可能会起作用?

If you only need to get names, you can easily use BeautifulSoup : 如果只需要获取名称,则可以轻松使用BeautifulSoup

from bs4 import BeautifulSoup

s='''<domains>
  <domain>
    <name>zoznam.sk</name>
    <description>Zoznam Slovakia</description>
    <service>
      <hostname>imap.zoznam.sk</hostname>
      <port>143</port>
      <protocol>IMAP</protocol>
      <authentication>PLAIN</authentication>
      <usernameIncludesDomain/>
    </service>
    <service>
      <hostname>smtp.zoznam.sk</hostname>
      <port>587</port>
      <protocol>SMTP</protocol>
      <authentication>PLAIN</authentication>
      <usernameIncludesDomain/>
    </service>
  </domain>
    <domain>
    <name>123mail.org</name>
    <description>123mail.org</description>
    <service>
      <hostname>imap.fastmail.com</hostname>
      <port>993</port>
      <protocol>IMAP</protocol>
      <ssl/>
      <requires/>
      <authentication>PLAIN</authentication>
      <usernameIncludesDomain/>
    </service>
    <service>
      <hostname>smtp.fastmail.com</hostname>
      <port>587</port>
      <protocol>SMTP</protocol>
      <ssl/>
      <requires/>
      <authentication>PLAIN</authentication>
      <usernameIncludesDomain/>
    </service>
  </domain>
    <domain>
    <name>Netvigator.com</name>
    <description>netvigator.com</description>
    <service>
      <hostname>corpmail1.netvigator.com</hostname>
      <port>995</port>
      <protocol>POP</protocol>
      <ssl/>
      <authentication>NONE</authentication>
      <usernameIncludesDomain/>
    </service>
    <service>
      <hostname>corpmail1.netvigator.com</hostname>
      <port>587</port>
      <protocol>SMTP</protocol>
      <ssl/>
      <authentication>NONE</authentication>
      <usernameIncludesDomain/>
    </service>
  </domain>
</domains>'''

soup = BeautifulSoup(s, 'html.parser')

configs = [n.text for n in soup.find_all('name')]

and you get: 你会得到:

['zoznam.sk', '123mail.org', 'Netvigator.com']

To get information for each service, you can add this code: 要获取每个服务的信息,可以添加以下代码:

soup = BeautifulSoup(s, 'html.parser')

configs = {}

services = soup.find_all('service')

for serv in services:
    hostname = serv.find('hostname').text
    configs[hostname] = {}
    configs[hostname]['port'] = serv.find('port').text
    configs[hostname]['protocol'] = serv.find('protocol').text
    configs[hostname]['auth'] = serv.find('authentication').text

and you get configs which is a dictionary of dictionaries: 你会得到configs ,它是字典的字典:

{'imap.zoznam.sk': {'port': '143', 'protocol': 'IMAP', 'auth': 'PLAIN'},
 'smtp.zoznam.sk': {'port': '587', 'protocol': 'SMTP', 'auth': 'PLAIN'},
 'imap.fastmail.com': {'port': '993', 'protocol': 'IMAP', 'auth': 'PLAIN'},
 'smtp.fastmail.com': {'port': '587', 'protocol': 'SMTP', 'auth': 'PLAIN'},
 'corpmail1.netvigator.com': {'port': '587', 'protocol': 'SMTP', 'auth': 'NONE'}}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM