[英]Batch export xml files to csv using python
I am new to python so please bear with me with silly questions I have multiple xml in the following format and I would like to extract certain tags within those xmls and export them to a single csv file.我是 python 的新手,所以请耐心解答一些愚蠢的问题
Here is an example of the xml (c:\xml\1.xml)这是 xml 的示例 (c:\xml\1.xml)
<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet type="text/xsl" href="emotionStyleSheet_template.xsl"?>
<EmotionReport>
<VersionInformation>
<Version>8.2.0</Version>
</VersionInformation>
<DateTime>
<Date>18-10-2021</Date>
<Time>14-12-26</Time>
</DateTime>
<SourceInformation>
<File>
<FilePath>//nas/emotionxml</FilePath>
<FileName>file001.mxf</FileName>
<FileSize>9972536969</FileSize>
<FileAudioInformation>
<AudioDuration>1345.0</AudioDuration>
<SampleRate>48000</SampleRate>
<NumChannels>8</NumChannels>
<BitsPerSample>24</BitsPerSample>
<AudioSampleGroups>64560000</AudioSampleGroups>
<NumStreams>8</NumStreams>
<Container>Undefined Sound</Container>
<Description>IMC Nexio
</Description>
<StreamInformation>
<Stream>
<StreamNumber>1</StreamNumber>
<NumChannelsInStream>1</NumChannelsInStream>
<Channel>
<ChannelNumber>1</ChannelNumber>
<ChannelEncoding>PCM</ChannelEncoding>
</Channel>
</Stream>
<Stream>
<StreamNumber>2</StreamNumber>
<NumChannelsInStream>1</NumChannelsInStream>
<Channel>
<ChannelNumber>1</ChannelNumber>
<ChannelEncoding>PCM</ChannelEncoding>
</Channel>
</Stream>
</StreamInformation>
<FileTimecodeInformation>
<FrameRate>25.00</FrameRate>
<DropFrame>false</DropFrame>
<StartTimecode>00:00:00:00</StartTimecode>
</FileTimecodeInformation>
</FileAudioInformation>
</File>
</SourceInformation>
</EmotionReport>
expect output result (EmotionData.csv)期望 output 结果(EmotionData.csv)
,Date,Time,FileName,Description,FileSize,FilePath
0,18-10-2021,14-12-26,file001.mxf,IMC Nexio,9972536969,//nas/emotionxml
1,13-10-2021,08-12-26,file002.mxf,IMC Nexio,3566536770,//nas/emotionxml
2,03-10-2021,02-09-21,file003.mxf,IMC Nexio,46357672,//nas/emotionxml
....
Here is the code I've wrote based on what I've learned from online resources (emotion_xml_parser.py):这是我根据从在线资源 (emotion_xml_parser.py) 中学到的知识编写的代码:
import xml.etree.ElementTree as ET
import glob2
import pandas as pd
cols = ["Date", "Time", "FileName", "Description", "FileSize", "FilePath"]
rows = []
for filename in glob2.glob(r'C:\xml\*.xml'):
xmlData = ET.parse(filename)
rootXML = xmlData.getroot()
for i in rootXML:
Date = i.findall("Date").text
Time = i.findall("Time").text
FileName = i.findall("FileName").text
Description = i.findall("Description").text
FileSize = i.findall("FileSize").text
FilePath = i.findall("FilePath").text
row.append({"Date": Date,
"Time": Time,
"FileName": FileName,
"Description": Description,
"FileSize": FileSize,
"FilePath": FilePath,})
df = pd.DataFrame(rows,columns = cols)
# Write dataframe to csv
df.to_csv("EmotionData.csv")
I am receiving the following error when running the script运行脚本时收到以下错误
File "c:\emtion_xml_parser.py", line 14, in <module>
Date = i.findall("Date").text
AttributeError: 'list' object has no attribute 'text'
TIA!蒂亚!
A better approach is to give the full path to each element you need, for example:更好的方法是为您需要的每个元素提供完整路径,例如:
import xml.etree.ElementTree as ET
import glob2
import pandas as pd
cols = ["Date", "Time", "FileName", "Description", "FileSize", "FilePath"]
rows = []
for filename in glob2.glob(r'*.xml'):
xmlData = ET.parse(filename)
root = xmlData.getroot()
row = {}
row['Date'] = root.findtext('DateTime/Date')
row['Time'] = root.findtext('DateTime/Time')
row['FileName'] = root.findtext('SourceInformation/File/FileName')
row['Description'] = root.findtext('SourceInformation/File/FileAudioInformation/Description').strip()
row['FileSize'] = root.findtext('SourceInformation/File/FileSize')
row['FilePath'] = root.findtext('SourceInformation/File/FilePath')
rows.append(row)
df = pd.DataFrame(rows, columns=cols)
# Write dataframe to csv
df.to_csv("EmotionData.csv")
Giving you:给你:
,Date,Time,FileName,Description,FileSize,FilePath
0,18-10-2021,14-12-26,file001.mxf,IMC Nexio,9972536969,//nas/emotionxml
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.