简体   繁体   English

如何使用 ElementTree 在 Python 中迭代 XML 标签并保存到 CSV

[英]How to iterate over XML tags in Python using ElementTree & save to CSV

I am trying to iterate over all nodes & child nodes in a tree using ElementTree.我正在尝试使用 ElementTree 迭代树中的所有节点和子节点。 I would like to get the all the parent & its child XML tags as columns and values which could append the child nodes to parent in CSV format.我想将所有父级及其子级 XML 标记作为列和值,这些列和值可以将子节点以 CSV 格式附加到父级。 I am using python 2.7.我正在使用 python 2.7。 The header should be printed only once & below should be respective values标题应该只打印一次,下面应该是各自的值

XML File : XML文件:

<Customers>  
<Customer CustomerID="GREAL">  
      <CompanyName>Great Lakes Food Market</CompanyName>  
      <ContactName>Howard Snyder</ContactName>  
      <ContactTitle>Marketing Manager</ContactTitle>  
      <Phone>(503) 555-7555</Phone>  
      <FullAddress>  
        <Address>2732 Baker Blvd.</Address>  
        <City>Eugene</City>  
        <Region>OR</Region>  
        <PostalCode>97403</PostalCode>  
        <Country>USA</Country>  
      </FullAddress>  
 </Customer>  
    <Customer CustomerID="HUNGC">  
      <CompanyName>Hungry Coyote Import Store</CompanyName>  
      <ContactName>Yoshi Latimer</ContactName>  
      <ContactTitle>Sales Representative</ContactTitle>  
      <Phone>(503) 555-6874</Phone>  
      <Fax>(503) 555-2376</Fax>  
      <FullAddress>  
        <Address>City Center Plaza 516 Main St.</Address>  
        <City>Elgin</City>  
        <Region>OR</Region>  
        <PostalCode>97827</PostalCode>  
        <Country>USA</Country>  
      </FullAddress>  
    </Customer>  
    <Customer CustomerID="LAZYK">  
      <CompanyName>Lazy K Kountry Store</CompanyName>  
      <ContactName>John Steel</ContactName>  
      <ContactTitle>Marketing Manager</ContactTitle>  
      <Phone>(509) 555-7969</Phone>  
      <Fax>(509) 555-6221</Fax>  
      <FullAddress>  
        <Address>12 Orchestra Terrace</Address>  
        <City>Walla Walla</City>  
        <Region>WA</Region>  
        <PostalCode>99362</PostalCode>  
        <Country>USA</Country>  
      </FullAddress>  
    </Customer>  
    <Customer CustomerID="LETSS">  
      <CompanyName>Let's Stop N Shop</CompanyName>  
      <ContactName>Jaime Yorres</ContactName>  
      <ContactTitle>Owner</ContactTitle>  
      <Phone>(415) 555-5938</Phone>  
      <FullAddress>  
        <Address>87 Polk St. Suite 5</Address>  
        <City>San Francisco</City>  
        <Region>CA</Region>  
        <PostalCode>94117</PostalCode>  
        <Country>USA</Country>  
      </FullAddress>  
    </Customer>  
  </Customers>  

My Code:我的代码:

#Import Libraries
import csv
import xmlschema
import xml.etree.ElementTree as ET

#Define the variable to store the XML Document
xml_file = 'C:/Users/391648/Desktop/BOSS_20190618_20190516_18062019141928_CUMA/source_Files_XML/CustomersOrders.xml'

#using XML Schema Library validate the XML against XSD
my_schema = xmlschema.XMLSchema('C:/Users/391648/Desktop/BOSS_20190618_20190516_18062019141928_CUMA/source_Files_XML/CustomersOrders.xsd')
SchemaCheck = my_schema.is_valid(xml_file)
print(SchemaCheck) #Prints as True if the document is validated with XSD

#Parse XML & get root
tree = ET.parse(xml_file)
root = tree.getroot()

#Create & Open CSV file
xml_data_to_csv = open('C:/Users/391648/Desktop/BOSS_20190618_20190516_18062019141928_CUMA/source_Files_XML/PythonXMl.csv','w')

#create variable to write to csv
csvWriter = csv.writer(xml_data_to_csv)

#Create list contains header
count =0

#Loop for each node
for element in root.findall('Customers/Customer'):
    List_nodes = []

    #Get head by Tag
    if count ==0:
        list_header =[]
        Full_Address = []
        CompanyName = element.find('CompanyName').tag
        list_header.append(CompanyName)

        ContactName = element.find('ContactName').tag
        list_header.append(ContactName)

        ContactTitle = element.find('ContactTitle').tag
        list_header.append(ContactTitle)

        Phone = element.find('Phone').tag
        list_header.append(Phone)

        print(list_header)
        csvWriter.writerow(list_header)

        count = count + 1

    #Get the data of the Node
    CompanyName = element.find('CompanyName').text
    List_nodes.append(CompanyName)

    ContactName = element.find('ContactName').text
    List_nodes.append(ContactName)

    ContactTitle = element.find('ContactTitle').text
    List_nodes.append(ContactTitle)

    Phone = element.find('Phone').text
    List_nodes.append(Phone)

    print(List_nodes)

    #Write List_Nodes to CSV
    csvWriter.writerow(List_nodes)

xml_data_to_csv.close()
Expected CSV output:

CompanyName,ContactName,ContactTitle,Phone, Address, City, Region, PostalCode, Country
Great Lakes Food Market,Howard Snyder,Marketing Manager,(503) 555-7555, City Center Plaza 516 Main St., Elgin, OR, 97827, USA
Hungry Coyote Import Store,Yoshi Latimer,Sales Representative,(503) 555-6874, 12 Orchestra Terrace, Walla Walla, WA, 99362, USA

You can use xmltodict to convert data to JSON format instead of parsing XML:您可以使用xmltodict将数据转换为 JSON 格式,而不是解析 XML:

import xmltodict
import pandas as pd

with open('data.xml', 'r') as f:
    data = xmltodict.parse(f.read())['Customers']['Customer']

data_pd = {'CompanyName': [i['CompanyName'] for i in data],
           'ContactName': [i['ContactName'] for i in data],
           'ContactTitle': [i['ContactTitle'] for i in data],
           'Phone': [i['Phone'] for i in data],
           'Address': [i['FullAddress']['Address'] for i in data],
           'City': [i['FullAddress']['City'] for i in data],
           'Region': [i['FullAddress']['Region'] for i in data],
           'PostalCode': [i['FullAddress']['PostalCode'] for i in data],
           'Country': [i['FullAddress']['Country'] for i in data]}

df = pd.DataFrame(data_pd)
df.to_csv('result.csv', index=False)

Output CSV file:输出 CSV 文件:

CompanyName,ContactName,ContactTitle,Phone,Address,City,Region,PostalCode,Country
Great Lakes Food Market,Howard Snyder,Marketing Manager,(503) 555-7555,2732 Baker Blvd.,Eugene,OR,97403,USA
Hungry Coyote Import Store,Yoshi Latimer,Sales Representative,(503) 555-6874,City Center Plaza 516 Main St.,Elgin,OR,97827,USA
Lazy K Kountry Store,John Steel,Marketing Manager,(509) 555-7969,12 Orchestra Terrace,Walla Walla,WA,99362,USA
Let's Stop N Shop,Jaime Yorres,Owner,(415) 555-5938,87 Polk St. Suite 5,San Francisco,CA,94117,USA

You might be better off using lxml .您最好使用lxml It has most of the desired functionality for finding elements built in.它具有查找内置元素所需的大部分功能。

from lxml import etree
import csv

with open('file.xml') as fp:
    xml = etree.fromstring(fp.read())

field_dict = {
    'CompanyName': 'CompanyName',
    'ContactName': 'ContactName',
    'ContactTitle': 'ContactTitle',
    'Phone': 'Phone',
    'Address': 'FullAddress/Address',
    'City': 'FullAddress/City',
    'Region': 'FullAddress/Region',
    'PostalCode': 'FullAddress/PostalCode',
    'Country': 'FullAddress/Country'
}

customers = []
for customer in xml:
    line = {k: customer.find(v).text for k, v in field_dict.items()}
    customers.append(line)

with open('customers.csv', 'w') as fp:
    writer = csv.DictWriter(fp, field_dict)
    writer.writerows(customers)

A couple of things I have changed:我改变了几件事:

  • Removed schema validation since I do not have the XSD.删除了架构验证,因为我没有 XSD。 You may include it你可以包括它
  • Made the child node traversal dynamic instead of statically referring each child node使子节点遍历动态而不是静态引用每个子节点
  • The main for loop condition changed to for customer in root.findall('Customer') from for customer in root.findall('Customers/Customer')for loop条件从for customer in root.findall('Customers/Customer') for customer in root.findall('Customer')更改为for customer in root.findall('Customer') for customer in root.findall('Customers/Customer')

However, I tried to keep your program structure, library usage intact .但是,我试图保持您的程序结构,库使用完整 Here is the modified program:这是修改后的程序:

import xml.etree.ElementTree as et
import csv

tree = et.parse("../data/customers.xml")
root = tree.getroot()
headers = []
count = 0
xml_data_to_csv = open('../data/customers.csv', 'w')

csvWriter = csv.writer(xml_data_to_csv)
for customer in root.findall('Customer'):
    data = []
    for detail in customer:
        if(detail.tag == 'FullAddress'):
            for addresspart in detail:
                data.append(addresspart.text.rstrip('/n/r'))
                if(count == 0):
                    headers.append(addresspart.tag)
        else:
            data.append(detail.text.rstrip('/n/r'))
            if(count == 0):
                headers.append(detail.tag)
    if(count == 0):
        csvWriter.writerow(headers)
    csvWriter.writerow(data)
    count = count + 1

With the given input XML content it produces:使用给定的输入XML内容,它会生成:

CompanyName,ContactName,ContactTitle,Phone,Address,City,Region,PostalCode,Country
Great Lakes Food Market,Howard Snyde,Marketing Manage,(503) 555-7555,2732 Baker Blvd.,Eugene,OR,97403,USA
Hungry Coyote Import Store,Yoshi Latime,Sales Representative,(503) 555-6874,(503) 555-2376,City Center Plaza 516 Main St.,Elgi,OR,97827,USA
Lazy K Kountry Store,John Steel,Marketing Manage,(509) 555-7969,(509) 555-6221,12 Orchestra Terrace,Walla Walla,WA,99362,USA
Let's Stop N Shop,Jaime Yorres,Owne,(415) 555-5938,87 Polk St. Suite 5,San Francisco,CA,94117,USA

Note: Instead of writing to CSV in the loop you may append to an array and write it at one go.注意:您可以附加到一个数组并一次性写入,而不是在循环中写入 CSV。 It depends on your content size and performance.这取决于您的内容大小和性能。


Update: When you have customers and their orders in the XML更新:当您在 XML 中有客户及其订单时

The XML processing and CSV writing code structure remains the same. XML 处理和 CSV 编写代码结构保持不变。 Additionally, process Orders element while processing customers.此外,在处理客户时处理Orders元素。 Now, under Orders Order elements can be processed exactly like Customer .现在,在Orders下可以像Customer一样处理Order元素。 As you mentioned each Order has ShipInfo as well.正如您提到的,每个Order也有ShipInfo

The input XML is assumed to be (based on the comment below):假设输入 XML 为(基于下面的注释):

<Customers>
    <Customer CustomerID="GREAL">
        <CompanyName>Great Lakes Food Market</CompanyName>
        <ContactName>Howard Snyder</ContactName>
        <ContactTitle>Marketing Manager</ContactTitle>
        <Phone>(503) 555-7555</Phone>
        <FullAddress>
            <Address>2732 Baker Blvd.</Address>
            <City>Eugene</City>
            <Region>OR</Region>
            <PostalCode>97403</PostalCode>
            <Country>USA</Country>
        </FullAddress>
        <Orders>
            <Order>
                <Param1>Value1</Param1>
                <Param2>Value2</Param2>
                <ShipInfo>
                    <ShipInfoParam1>Value3</ShipInfoParam1>
                    <ShipInfoParam2>Value4</ShipInfoParam2>
                </ShipInfo>
            </Order>
            <Order>
                <Param1>Value5</Param1>
                <Param2>Value6</Param2>
                <ShipInfo>
                    <ShipInfoParam1>Value7</ShipInfoParam1>
                    <ShipInfoParam2>Value8</ShipInfoParam2>
                </ShipInfo>
            </Order>
        </Orders>
    </Customer>
    <Customer CustomerID="HUNGC">
        <CompanyName>Hungry Coyote Import Store</CompanyName>
        <ContactName>Yoshi Latimer</ContactName>
        <ContactTitle>Sales Representative</ContactTitle>
        <Phone>(503) 555-6874</Phone>
        <Fax>(503) 555-2376</Fax>
        <FullAddress>
            <Address>City Center Plaza 516 Main St.</Address>
            <City>Elgin</City>
            <Region>OR</Region>
            <PostalCode>97827</PostalCode>
            <Country>USA</Country>
        </FullAddress>
        <Orders>
            <Order>
                <Param1>Value7</Param1>
                <Param2>Value8</Param2>
                <ShipInfo>
                    <ShipInfoParam1>Value9</ShipInfoParam1>
                    <ShipInfoParam2>Value10</ShipInfoParam2>
                </ShipInfo>
            </Order>
        </Orders>
    </Customer>
    <Customer CustomerID="LAZYK">
        <CompanyName>Lazy K Kountry Store</CompanyName>
        <ContactName>John Steel</ContactName>
        <ContactTitle>Marketing Manager</ContactTitle>
        <Phone>(509) 555-7969</Phone>
        <Fax>(509) 555-6221</Fax>
        <FullAddress>
            <Address>12 Orchestra Terrace</Address>
            <City>Walla Walla</City>
            <Region>WA</Region>
            <PostalCode>99362</PostalCode>
            <Country>USA</Country>
        </FullAddress>
    </Customer>
    <Customer CustomerID="LETSS">
        <CompanyName>Let's Stop N Shop</CompanyName>
        <ContactName>Jaime Yorres</ContactName>
        <ContactTitle>Owner</ContactTitle>
        <Phone>(415) 555-5938</Phone>
        <FullAddress>
            <Address>87 Polk St. Suite 5</Address>
            <City>San Francisco</City>
            <Region>CA</Region>
            <PostalCode>94117</PostalCode>
            <Country>USA</Country>
        </FullAddress>
    </Customer>
</Customers>

Here is the modified code that process both customers and orders:这是处理客户和订单的修改后的代码:

import xml.etree.ElementTree as et
import csv

tree = et.parse("../data/customers-with-orders.xml")
root = tree.getroot()

customer_csv = open('../data/customers-part.csv', 'w')
order_csv = open('../data/orders-part.csv', 'w')

customerCsvWriter = csv.writer(customer_csv)
orderCsvWriter = csv.writer(order_csv)

customerHeaders = []
orderHeaders = ['CustomerID']
isFirstCustomer = True
isFirstOrder = True


def processOrders(customerId):
    global isFirstOrder
    for order in detail.findall('Order'):
        orderData = [customerId]
        for orderdetail in order:
            if(orderdetail.tag == 'ShipInfo'):
                for shipinfopart in orderdetail:
                    orderData.append(shipinfopart.text.rstrip('/n/r'))
                    if(isFirstOrder):
                        orderHeaders.append(shipinfopart.tag)
            else:
                orderData.append(orderdetail.text.rstrip('/n/r'))
                if(isFirstOrder):
                    orderHeaders.append(orderdetail.tag)
        if(isFirstOrder):
            orderCsvWriter.writerow(orderHeaders)
        orderCsvWriter.writerow(orderData)
        isFirstOrder = False


for customer in root.findall('Customer'):
    customerData = []
    customerId = customer.get('CustomerID')
    for detail in customer:
        if(detail.tag == 'FullAddress'):
            for addresspart in detail:
                customerData.append(addresspart.text.rstrip('/n/r'))
                if(isFirstCustomer):
                    customerHeaders.append(addresspart.tag)
        elif(detail.tag == 'Orders'):
            processOrders(customerId)
        else:
            customerData.append(detail.text.rstrip('/n/r'))
            if(isFirstCustomer):
                customerHeaders.append(detail.tag)
    if(isFirstCustomer):
        customerCsvWriter.writerow(customerHeaders)
    customerCsvWriter.writerow(customerData)
    isFirstCustomer = False

Output produced in customers-part.csv:在customers-part.csv 中产生的输出:

CompanyName,ContactName,ContactTitle,Phone,Address,City,Region,PostalCode,Country
Great Lakes Food Market,Howard Snyde,Marketing Manage,(503) 555-7555,2732 Baker Blvd.,Eugene,OR,97403,USA
Hungry Coyote Import Store,Yoshi Latime,Sales Representative,(503) 555-6874,(503) 555-2376,City Center Plaza 516 Main St.,Elgi,OR,97827,USA
Lazy K Kountry Store,John Steel,Marketing Manage,(509) 555-7969,(509) 555-6221,12 Orchestra Terrace,Walla Walla,WA,99362,USA
Let's Stop N Shop,Jaime Yorres,Owne,(415) 555-5938,87 Polk St. Suite 5,San Francisco,CA,94117,USA

Output produced in orders-part.csv:在 orders-part.csv 中产生的输出:

CustomerID,Param1,Param2,ShipInfoParam1,ShipInfoParam2
GREAL,Value1,Value2,Value3,Value4
GREAL,Value5,Value6,Value7,Value8
HUNGC,Value7,Value8,Value9,Value10

Note: the code can be optimized further by reusing.注意:代码可以通过重用进一步优化。 I am leaving that part to you.我把那部分留给你。 Secondly, notice that in each order customer Id is added in order to distinguish.其次,注意在每个订单中都添加了customer Id,以便区分。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM