简体   繁体   中英

Creating an XML File from a dataframe using Python and LXML

I'm trying to create an XML file that uses a Pandas dataframe to populate the elements and subelements. Here is the code I have written:

import pandas as pd
from lxml import etree as et

df = pd.DataFrame({'id_profile': [439, 444654, 56454, 12222], 'ServiceDate': 
['2017-12-05', '2017-01-25', '2017-12-05', '2017-01-25'], 
'PrimaryServiceCategory': [25, 25, 33, 25]})

root = et.Element('ClientReport')
idnum = et.SubElement(root, 'ID')
prime_serv = et.SubElement(root, 'ServiceCategory')
serv_date = et.SubElement(root, 'ServiceDate')

for row in df.iterrows():
    idnum.text = df['id_profile']
    prime_serv.text = df['PrimaryServiceCategory']
    serv_date.text = df['ServiceDate']

print(et.tostring(root, pretty_print=True))

My expected outcome is:

<ClientReport>
   <ID>439</ID>
   <ServiceCategory>25</ServiceCategory>
   <ServiceDate>2017-12-05</ServiceDate>
</ClientReport>
<ClientReport>
   <ID>444654</ID>
   <ServiceCategory>25</ServiceCategory>
   <ServiceDate>2017-01-25</ServiceDate>
</ClientReport>
<ClientReport>
   <ID>12222</ID>
   <ServiceCategory>25</ServiceCategory>
   <ServiceDate>2017-01-25</ServiceDate>
</ClientReport>

Instead, I get:

TypeError: Argument must be bytes or unicode, got 'Series'

I'm not sure how to get iterated rows, rather than a static value, to populate the XML file's attributes. Is Pandas the right package here? And is LXML the right one as well?

One solution is to iterate over the dataframe, populating each XML element individually:

import pandas as pd
from lxml import etree as et

df = pd.DataFrame({'id_profile': [439, 444654, 56454, 12222], 'ServiceDate':
['2017-12-05', '2017-01-25', '2017-12-05', '2017-01-25'],
'PrimaryServiceCategory': [25, 25, 33, 25]})

root = et.Element('root')

for row in df.iterrows():
    report = et.SubElement(root, 'ClientReport')
    idnum = et.SubElement(report, 'ID')
    prime_serv = et.SubElement(report, 'ServiceCategory')
    serv_date = et.SubElement(report, 'ServiceDate')
    idnum.text = str(row[1]['id_profile'])
    prime_serv.text = str(row[1]['PrimaryServiceCategory'])
    serv_date.text = str(row[1]['ServiceDate'])

print(et.tostring(root, pretty_print=True).decode('utf-8'))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM