使用 Python 解析 XML 文件。多重層次

Question

我想使用 python 解析以下 xml 文件。 我的“文件夾”變量設置為始終等於<link>標簽末尾的 8 位數字。 在這種情況下，它是 11119709。

Python

for folder in folderList:

我想說的是，當“文件夾”等於鏈接標簽中的最后 8 位數字時，請告訴我 eq:seconds 值是多少。 我嘗試使用 python docs element tree 提供的代碼，但是我遇到了麻煩，因為有很多層次結構。 root[0][1].text 不會檢索 item 標簽下的變量。

XML

-<rss xmlns:georss="http://www.georss.org/georss/" xmlns:eq="http://earthquake.usgs.gov/rss/1.0/" xmlns:dc="http://purl.org/dc/elements/1.1/"   xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" version="2.0">
  -<channel>
      <title>USGS Earthquake ShakeMaps</title>
      <description>List of ShakeMaps for events in the last 30 days</description>
      <link>http://earthquake.usgs.gov/</link>
      <dc:publisher>U.S. Geological Survey</dc:publisher>
      <pubDate>Thu, 27 Mar 2014 15:33:05 +0000</pubDate>
      <item>
         <title>4.11 - 79.3 miles NNW of Kotzebue</title>
         <description>
         <![CDATA[<img src="http://earthquake.usgs.gov/eqcenter/shakemap/thumbs/shakemap_ak_11199709.jpg" width="100" align="left" hspace="10"/><p>Date: Thu, 27 Mar 2014 07:28:31 UTC<br/>Lat/Lon: 67.9858/-163.494<br/>Depth: 15.9122</p>]]></description>
         <link>http://earthquake.usgs.gov/eqcenter/shakemap/ak/shake/11199709/</link>
         <pubDate>Thu, 27 Mar 2014 07:53:33 +0000</pubDate>
         <geo:lat>67.9858</geo:lat>
         <geo:long>-163.494</geo:long>
         <dc:subject>4</dc:subject>
         <eq:seconds>1395905311</eq:seconds>
         <eq:depth>15.9122</eq:depth>
         <eq:region>ak</eq:region>
         </item>
       <item>
              ...similar to above item

Answer 1

如果您擔心速度，我推薦lxml 。 它有額外的依賴，但通常比 BeautifulSoup 快得多。

Answer 2

使用可以解析 HTML 和 XML（帶有外部模塊）的BeautifulSoup ，並且比 Python 中包含的更容易使用。

這段代碼應該做你想做的：

from bs4 import BeautifulSoup

xml = BeautifulSoup(open("filename.xml")) # here you load your XML file
# you can also load it from an URL by using "urllib" or "Python-Requests"

# BeautifulSoup(open("filename.xml"), "xml") # if you want to use an XML parser
# see comments below

for folder in folderList:
    for item in xml.findAll("items"): # iterate through all <item> elements
        if folder in item.link.text: # if folder's name is in the <link> element
            print(item.find("eq:seconds").text) # print the <eq:seconds> element

使用 Python 解析 XML 文件。多重層次

問題描述

2 個解決方案

解決方案1
1 2014-03-28 14:48:36

解決方案2
0

使用 Python 解析 XML 文件。 多重層次

問題描述

2 個解決方案

解決方案1 1 2014-03-28 14:48:36

解決方案2 0

使用 Python 解析 XML 文件。多重層次

解決方案1
1 2014-03-28 14:48:36

解決方案2
0