簡體   English   中英

將 xml 文件轉換為 pandas dataframe

[英]Convert xml file to pandas dataframe

I am new to using pandas with xml data and I can't figure out how to convert an xml file to pandas dataframe using the standard read_xml function. 我嘗試了以下代碼,但它沒有拾取數據字段

import pandas as pd

xml='''
<TimeSeries xmlns="http://www.wldelft.nl/fews/PI" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.wldelft.nl/fews/PI http://fews.wldelft.nl/schemas/version1.0/pi-schemas/pi_timeseries.xsd" version="1.26" xmlns:fs="http://www.wldelft.nl/fews/fs">
    <timeZone>1.0</timeZone>
    <series>
        <header>
            <type>instantaneous</type>
            <moduleInstanceId>pr.pompvolumes</moduleInstanceId>
            <locationId>SL000246</locationId>
            <parameterId>Q.B.d</parameterId>
            <timeStep unit="second" multiplier="86400"/>
            <startDate date="2018-01-01" time="00:00:00"/>
            <endDate date="2022-01-01" time="00:00:00"/>
            <missVal>NaN</missVal>
            <stationName>Putten vijzel</stationName>
            <lat>52.263570497449855</lat>
            <lon>5.495717667656339</lon>
            <x>162408.0</x>
            <y>475066.0</y>
            <units>m3/s</units>
        </header>
        <event date="2018-01-01" time="00:00:00" value="1.262" flag="0"/>
        <event date="2018-01-02" time="00:00:00" value="1.456" flag="0"/>
        <event date="2018-01-03" time="00:00:00" value="0.845" flag="0"/>
        <event date="2018-01-04" time="00:00:00" value="1.507" flag="0"/>
        <event date="2018-01-05" time="00:00:00" value="1.083" flag="0"/>
        <event date="2018-01-06" time="00:00:00" value="0.516" flag="0"/>
        </series>
</TimeSeries>
'''

df = pd.read_xml(xml)

生成的 dataframe 應具有如下格式:

data = [['2018-01-01', 1.262, 0], ['2018-01-02', 1.456, 0], ['2018-01-03', 0.845, 0]]
df = pd.DataFrame(data, columns=['event date', 'value', 'flag' ])

非常感謝任何幫助!

  • pd.read_xml與分配給namespaces參數的dict一起使用,其中鍵是“臨時命名空間前綴”(例如doc ),值引用表示為xmlns的命名空間(例如: http://www.wldelft.nl/fews/PI )。
  • 然后使用此字典查找正確的xpath 這里: 'doc:series/doc:event'
df = pd.read_xml(xml, xpath='doc:series/doc:event', 
                 namespaces={'doc':'http://www.wldelft.nl/fews/PI'})

print(df)

         date      time  value  flag
0  2018-01-01  00:00:00  1.262     0
1  2018-01-02  00:00:00  1.456     0
2  2018-01-03  00:00:00  0.845     0
3  2018-01-04  00:00:00  1.507     0
4  2018-01-05  00:00:00  1.083     0
5  2018-01-06  00:00:00  0.516     0

# drop `time`
df.drop('time', axis=1, inplace=True)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM