[英]Python script fails to extract data from XML
我正在嘗試通過此 XML 文件解析和獲取數據。 但是我的代碼沒有按預期工作,而是什么都不顯示。 這是我正在使用的 XML 文件:
<feed xml:base="http://data.treasury.gov/Feed.svc/">
<title type="text">DailyTreasuryYieldCurveRateData</title>
<id>
http://data.treasury.gov/feed.svc/DailyTreasuryYieldCurveRateData
</id>
<updated>2019-11-04T07:15:32Z</updated>
<link rel="self" title="DailyTreasuryYieldCurveRateData" href="DailyTreasuryYieldCurveRateData"/>
<entry>
<id>
http://data.treasury.gov/Feed.svc/DailyTreasuryYieldCurveRateData(7258)
</id>
<title type="text"/>
<updated>2019-11-04T07:15:32Z</updated>
<author>
<name/>
</author>
<link rel="edit" title="DailyTreasuryYieldCurveRateDatum" href="DailyTreasuryYieldCurveRateData(7258)"/>
<category term="TreasuryDataWarehouseModel.DailyTreasuryYieldCurveRateDatum" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme"/>
<content type="application/xml">
<m:properties>
<d:Id m:type="Edm.Int32">7258</d:Id>
<d:NEW_DATE m:type="Edm.DateTime">2019-01-02T00:00:00</d:NEW_DATE>
<d:BC_1MONTH m:type="Edm.Double">2.4</d:BC_1MONTH>
<d:BC_2MONTH m:type="Edm.Double">2.4</d:BC_2MONTH>
<d:BC_3MONTH m:type="Edm.Double">2.42</d:BC_3MONTH>
<d:BC_6MONTH m:type="Edm.Double">2.51</d:BC_6MONTH>
<d:BC_1YEAR m:type="Edm.Double">2.6</d:BC_1YEAR>
<d:BC_2YEAR m:type="Edm.Double">2.5</d:BC_2YEAR>
<d:BC_3YEAR m:type="Edm.Double">2.47</d:BC_3YEAR>
<d:BC_5YEAR m:type="Edm.Double">2.49</d:BC_5YEAR>
<d:BC_7YEAR m:type="Edm.Double">2.56</d:BC_7YEAR>
<d:BC_10YEAR m:type="Edm.Double">2.66</d:BC_10YEAR>
<d:BC_20YEAR m:type="Edm.Double">2.83</d:BC_20YEAR>
<d:BC_30YEAR m:type="Edm.Double">2.97</d:BC_30YEAR>
<d:BC_30YEARDISPLAY m:type="Edm.Double">2.97</d:BC_30YEARDISPLAY>
</m:properties>
</content>
</entry>
<entry>
<id>
http://data.treasury.gov/Feed.svc/DailyTreasuryYieldCurveRateData(7259)
</id>
<title type="text"/>
<updated>2019-11-04T07:15:32Z</updated>
<author>
<name/>
</author>
<link rel="edit" title="DailyTreasuryYieldCurveRateDatum" href="DailyTreasuryYieldCurveRateData(7259)"/>
<category term="TreasuryDataWarehouseModel.DailyTreasuryYieldCurveRateDatum" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme"/>
<content type="application/xml">
<m:properties>
<d:Id m:type="Edm.Int32">7259</d:Id>
<d:NEW_DATE m:type="Edm.DateTime">2019-01-03T00:00:00</d:NEW_DATE>
<d:BC_1MONTH m:type="Edm.Double">2.42</d:BC_1MONTH>
<d:BC_2MONTH m:type="Edm.Double">2.42</d:BC_2MONTH>
<d:BC_3MONTH m:type="Edm.Double">2.41</d:BC_3MONTH>
<d:BC_6MONTH m:type="Edm.Double">2.47</d:BC_6MONTH>
<d:BC_1YEAR m:type="Edm.Double">2.5</d:BC_1YEAR>
<d:BC_2YEAR m:type="Edm.Double">2.39</d:BC_2YEAR>
<d:BC_3YEAR m:type="Edm.Double">2.35</d:BC_3YEAR>
<d:BC_5YEAR m:type="Edm.Double">2.37</d:BC_5YEAR>
<d:BC_7YEAR m:type="Edm.Double">2.44</d:BC_7YEAR>
<d:BC_10YEAR m:type="Edm.Double">2.56</d:BC_10YEAR>
<d:BC_20YEAR m:type="Edm.Double">2.75</d:BC_20YEAR>
<d:BC_30YEAR m:type="Edm.Double">2.92</d:BC_30YEAR>
<d:BC_30YEARDISPLAY m:type="Edm.Double">2.92</d:BC_30YEARDISPLAY>
</m:properties>
</content>
</entry>
<entry>
<id>
http://data.treasury.gov/Feed.svc/DailyTreasuryYieldCurveRateData(7260)
</id>
<title type="text"/>
<updated>2019-11-04T07:15:32Z</updated>
<author>
<name/>
</author>
<link rel="edit" title="DailyTreasuryYieldCurveRateDatum" href="DailyTreasuryYieldCurveRateData(7260)"/>
<category term="TreasuryDataWarehouseModel.DailyTreasuryYieldCurveRateDatum" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme"/>
<content type="application/xml">
<m:properties>
<d:Id m:type="Edm.Int32">7260</d:Id>
<d:NEW_DATE m:type="Edm.DateTime">2019-01-04T00:00:00</d:NEW_DATE>
<d:BC_1MONTH m:type="Edm.Double">2.4</d:BC_1MONTH>
<d:BC_2MONTH m:type="Edm.Double">2.42</d:BC_2MONTH>
<d:BC_3MONTH m:type="Edm.Double">2.42</d:BC_3MONTH>
<d:BC_6MONTH m:type="Edm.Double">2.51</d:BC_6MONTH>
<d:BC_1YEAR m:type="Edm.Double">2.57</d:BC_1YEAR>
<d:BC_2YEAR m:type="Edm.Double">2.5</d:BC_2YEAR>
<d:BC_3YEAR m:type="Edm.Double">2.47</d:BC_3YEAR>
<d:BC_5YEAR m:type="Edm.Double">2.49</d:BC_5YEAR>
<d:BC_7YEAR m:type="Edm.Double">2.56</d:BC_7YEAR>
<d:BC_10YEAR m:type="Edm.Double">2.67</d:BC_10YEAR>
<d:BC_20YEAR m:type="Edm.Double">2.83</d:BC_20YEAR>
<d:BC_30YEAR m:type="Edm.Double">2.98</d:BC_30YEAR>
<d:BC_30YEARDISPLAY m:type="Edm.Double">2.98</d:BC_30YEARDISPLAY>
</m:properties>
</content>
</entry>
</feed>
到目前為止,我已經嘗試過:
import requests as rq
import lxml.etree as ET
tree = ET.parse('DailyTreasuryYieldCurveRateData')
root = tree.getroot()
for ele in root.xpath('./entry/content/m:properties',
namespaces={'m': 'http://schemas.microsoft.com/ado/2007/08/dataservices/scheme'}):
print(ele)
for foo in ele:
print(foo.tag, foo.text)
但是,什么都沒有顯示。
我希望 output 是:
NEW_DATE 2019-10-31T00:00:00
BC_1MONTH 1.59
BC_2MONTH 1.59
BC_3MONTH 1.59
BC_6MONTH 1.57
BC_1YEAR 1.54
BC_2YEAR 1.52
BC_3YEAR 1.52
BC_5YEAR 1.51
BC_7YEAR 1.6
BC_10YEAR 1.69
BC_20YEAR 2
BC_30YEAR 2.17
BC_30YEARDISPLAY 2.17
同樣,它應該循環到完整的 XML 並生成類似的 output。 請告訴我我哪里錯了。
您可以使用python-benedict
輕松完成,這是一個了不起的dict
子類。
要安裝它,只需運行pip install python-benedict
,然后:
data_xml = """
<feed xml:base="http://data.treasury.gov/Feed.svc/">
<title type="text">DailyTreasuryYieldCurveRateData</title>
<id>
http://data.treasury.gov/feed.svc/DailyTreasuryYieldCurveRateData
</id>
<updated>2019-11-04T07:15:32Z</updated>
<link rel="self" title="DailyTreasuryYieldCurveRateData" href="DailyTreasuryYieldCurveRateData"/>
<entry>
<id>
http://data.treasury.gov/Feed.svc/DailyTreasuryYieldCurveRateData(7258)
</id>
<title type="text"/>
<updated>2019-11-04T07:15:32Z</updated>
<author>
<name/>
</author>
<link rel="edit" title="DailyTreasuryYieldCurveRateDatum" href="DailyTreasuryYieldCurveRateData(7258)"/>
<category term="TreasuryDataWarehouseModel.DailyTreasuryYieldCurveRateDatum" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme"/>
<content type="application/xml">
<m:properties>
<d:Id m:type="Edm.Int32">7258</d:Id>
<d:NEW_DATE m:type="Edm.DateTime">2019-01-02T00:00:00</d:NEW_DATE>
<d:BC_1MONTH m:type="Edm.Double">2.4</d:BC_1MONTH>
<d:BC_2MONTH m:type="Edm.Double">2.4</d:BC_2MONTH>
<d:BC_3MONTH m:type="Edm.Double">2.42</d:BC_3MONTH>
<d:BC_6MONTH m:type="Edm.Double">2.51</d:BC_6MONTH>
<d:BC_1YEAR m:type="Edm.Double">2.6</d:BC_1YEAR>
<d:BC_2YEAR m:type="Edm.Double">2.5</d:BC_2YEAR>
<d:BC_3YEAR m:type="Edm.Double">2.47</d:BC_3YEAR>
<d:BC_5YEAR m:type="Edm.Double">2.49</d:BC_5YEAR>
<d:BC_7YEAR m:type="Edm.Double">2.56</d:BC_7YEAR>
<d:BC_10YEAR m:type="Edm.Double">2.66</d:BC_10YEAR>
<d:BC_20YEAR m:type="Edm.Double">2.83</d:BC_20YEAR>
<d:BC_30YEAR m:type="Edm.Double">2.97</d:BC_30YEAR>
<d:BC_30YEARDISPLAY m:type="Edm.Double">2.97</d:BC_30YEARDISPLAY>
</m:properties>
</content>
</entry>
<entry>
<id>
http://data.treasury.gov/Feed.svc/DailyTreasuryYieldCurveRateData(7259)
</id>
<title type="text"/>
<updated>2019-11-04T07:15:32Z</updated>
<author>
<name/>
</author>
<link rel="edit" title="DailyTreasuryYieldCurveRateDatum" href="DailyTreasuryYieldCurveRateData(7259)"/>
<category term="TreasuryDataWarehouseModel.DailyTreasuryYieldCurveRateDatum" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme"/>
<content type="application/xml">
<m:properties>
<d:Id m:type="Edm.Int32">7259</d:Id>
<d:NEW_DATE m:type="Edm.DateTime">2019-01-03T00:00:00</d:NEW_DATE>
<d:BC_1MONTH m:type="Edm.Double">2.42</d:BC_1MONTH>
<d:BC_2MONTH m:type="Edm.Double">2.42</d:BC_2MONTH>
<d:BC_3MONTH m:type="Edm.Double">2.41</d:BC_3MONTH>
<d:BC_6MONTH m:type="Edm.Double">2.47</d:BC_6MONTH>
<d:BC_1YEAR m:type="Edm.Double">2.5</d:BC_1YEAR>
<d:BC_2YEAR m:type="Edm.Double">2.39</d:BC_2YEAR>
<d:BC_3YEAR m:type="Edm.Double">2.35</d:BC_3YEAR>
<d:BC_5YEAR m:type="Edm.Double">2.37</d:BC_5YEAR>
<d:BC_7YEAR m:type="Edm.Double">2.44</d:BC_7YEAR>
<d:BC_10YEAR m:type="Edm.Double">2.56</d:BC_10YEAR>
<d:BC_20YEAR m:type="Edm.Double">2.75</d:BC_20YEAR>
<d:BC_30YEAR m:type="Edm.Double">2.92</d:BC_30YEAR>
<d:BC_30YEARDISPLAY m:type="Edm.Double">2.92</d:BC_30YEARDISPLAY>
</m:properties>
</content>
</entry>
<entry>
<id>
http://data.treasury.gov/Feed.svc/DailyTreasuryYieldCurveRateData(7260)
</id>
<title type="text"/>
<updated>2019-11-04T07:15:32Z</updated>
<author>
<name/>
</author>
<link rel="edit" title="DailyTreasuryYieldCurveRateDatum" href="DailyTreasuryYieldCurveRateData(7260)"/>
<category term="TreasuryDataWarehouseModel.DailyTreasuryYieldCurveRateDatum" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme"/>
<content type="application/xml">
<m:properties>
<d:Id m:type="Edm.Int32">7260</d:Id>
<d:NEW_DATE m:type="Edm.DateTime">2019-01-04T00:00:00</d:NEW_DATE>
<d:BC_1MONTH m:type="Edm.Double">2.4</d:BC_1MONTH>
<d:BC_2MONTH m:type="Edm.Double">2.42</d:BC_2MONTH>
<d:BC_3MONTH m:type="Edm.Double">2.42</d:BC_3MONTH>
<d:BC_6MONTH m:type="Edm.Double">2.51</d:BC_6MONTH>
<d:BC_1YEAR m:type="Edm.Double">2.57</d:BC_1YEAR>
<d:BC_2YEAR m:type="Edm.Double">2.5</d:BC_2YEAR>
<d:BC_3YEAR m:type="Edm.Double">2.47</d:BC_3YEAR>
<d:BC_5YEAR m:type="Edm.Double">2.49</d:BC_5YEAR>
<d:BC_7YEAR m:type="Edm.Double">2.56</d:BC_7YEAR>
<d:BC_10YEAR m:type="Edm.Double">2.67</d:BC_10YEAR>
<d:BC_20YEAR m:type="Edm.Double">2.83</d:BC_20YEAR>
<d:BC_30YEAR m:type="Edm.Double">2.98</d:BC_30YEAR>
<d:BC_30YEARDISPLAY m:type="Edm.Double">2.98</d:BC_30YEARDISPLAY>
</m:properties>
</content>
</entry>
</feed>
"""
現在初始化一個benedict
實例:
from benedict import benedict as bdict
# this method accepts a data string, a file path or a file url
data = bdict.from_xml(data_xml)
# print(data.dump())
entries = data['feed.entry']
for entry in entries:
props = bdict(bdict(entry)['content.m:properties'])
# print(props.dump())
for key, value in props.items():
print(key, value['#text'])
print('-----')
python-benedict
在 GitHub 上經過充分測試、記錄和開源:
https://github.com/fabiocaccamo/python-benedict
免責聲明:我是這個項目的作者。
您打開了</feed>
標記但未關閉。 因此,只需將其添加到 XML 的末尾即可。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.