用python解析xml文件

Question

我正在嘗試解析與python腳本位於同一文件夾中的xml文件，但是當我運行該腳本時，它不會像預期的那樣在終端中打印。 我正在使用ElementTree，這是我的代碼：

import xml.etree.ElementTree

f = xml.etree.ElementTree.parse('atom.xml').getroot()
for atype in f.findall('link'):
   print(atype.get('href'))

這就是我想從xml中獲取的href

<?xml version='1.0' ?>
 <feed xmlns="http://www.w3.org/2005/Atom">
 <title type="text">Gwern</title>
 <id>https://www.gwern.net/</id>
 <updated>2017-07-22T14:57:39Z</updated>
 <link href="https://www.gwern.net/atom.xml" rel="self" />
<author>
<name>gwern</name>
</author>
<author>
 <name>ujdRR</name>
</author>
 <generator uri="http://github.com/jgm/gitit"    version="HEAD">gitit</generator>
<entry>
<id>https://www.gwern.net/Mail%20delivery?   utm_source=RSS&amp;utm_medium=feed&amp;utm_campaign=1</id>
  <title type="text">Modified &quot;Mail delivery.page&quot;, Modified   &quot;Mistakes.page&quot;, Modified &quot;Nootropics.page&quot;, Modified &quot;Touhou.page&quot;, Modified &quot;Wikipedia resume.page&quot;,         &quot;Zeo.page&quot;, Modified &quot;hakyll.hs&quot;, Modified &quot;newsletter/2017/06.page&quot;, Modified &quot;the-long-stagnation.page&quot;, Modified &quot;wittgenstein-thesis.page&quot;</title>
<updated>2017-06-25T04:00:06Z</updated>
<author>
  <name>gwern</name>
</author>
<link href="https://www.gwern.net/Mail%20delivery?utm_source=RSS&amp;utm_medium=feed&amp;utm_campaign=1" rel="alternate" />
<summary type="text">record all minor pending edits</summary>

Answer 1

問題：...我想從xml中得到什么href

您的XML有一個命名空間 ： <feed xmlns="http://www.w3.org/2005/Atom">' ，
因此，您必須使用帶有findall的命名空間參數。
其次， XML有兩個<link ...>標記，一個在<entry>標記內。

findall（self，path，namespaces = None）
查找與ElementPath表達式匹配的所有元素。 與getroot（）。findall（path）相同。
可選的namespaces參數接受前綴到命名空間的映射，該映射允許在路徑表達式中使用XPath前綴。

root = tree.getroot()
namespaces = {
'xmlns':"http://www.w3.org/2005/Atom"
}

# Get the First <link ...> Outside <entry>
link = root.findall('./xmlns:link', namespaces)[0]
print('link:{} {}'.format(link, link.get('href')))

# Find all <link ...> Inside <entry>
for link in root.findall('./xmlns:entry/xmlns:link', namespaces):
    print(link.get('href'))

輸出：

 link:<Element {http://www.w3.org/2005/Atom}link at 0xf6a6d8ac> https://www.gwern.net/atom.xml https://www.gwern.net/Mail%20delivery?utm_source=RSS&utm_medium=feed&utm_campaign=1

使用Python測試：3.4.2

用python解析xml文件

問題描述

1 個解決方案

解決方案1
2 2017-07-25 10:08:06

用python解析xml文件

問題描述

1 個解決方案

解決方案1 2 2017-07-25 10:08:06

解決方案1
2 2017-07-25 10:08:06