簡體   English   中英

使用ElementTree進行Python XML解析:如何查找具有相同名稱的元素的值?

[英]Python XML parsing with ElementTree: How to find values of elements with the same name?

免責聲明:我是Python,XML和程序設計的新手。 該代碼(我從互聯網上竊取的)可以工作,但存在一些問題,似乎無法找到答案或無法解決。

我正在嘗試從grants.gov xml提取網站解析XML文件,目的是刪除所有不在“無限制”資格類別中的資助(在XML中標記為“ EligibilityCategory”為“ 99”)。並輸出一個新的xml文件。

我下面的代碼正確地刪除了沒有興趣的資金操作,但也刪除了具有多個EligibilityCategorys(也包括“ 99”)的資金操作。 我認為這是因為.find僅抓住第一次出現的情況。 我嘗試使用.findall,但無法解決。 在此先感謝您的幫助。

import xml.etree.ElementTree as etree
tree = etree.parse('sample.xml')
root = tree.getroot()

for FundingOppSynopsis in root.findall('FundingOppSynopsis'): 
    ID = int(FundingOppSynopsis.find('EligibilityCategory').text)
    if ID != 99:
        root.remove(FundingOppSynopsis)

tree.write("Output/output.xml", xml_declaration=True, encoding='UTF-8', method="xml")

樣本(明顯減少了)XML:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Grants SYSTEM "http://apply07.grants.gov/search/dtd/XMLExtract.dtd">
<Grants>
    <FundingOppSynopsis>
        <FundingOppNumber>USDA-RMA-RME-2008-03</FundingOppNumber>
        <ApplicationsDueDate>03242008</ApplicationsDueDate>
        <Office>Risk Management Agency</Office>
        <Agency>Department of Agriculture</Agency>
        <EligibilityCategory>25</EligibilityCategory>
    </FundingOppSynopsis>
    <FundingOppSynopsis>
        <FundingOppNumber>NPS-ARRAWHIS100315</FundingOppNumber>
        <ApplicationsDueDate>11282009</ApplicationsDueDate>
        <Office>National Park Service</Office>
        <Agency>Department of the Interior</Agency>
        <EligibilityCategory>00</EligibilityCategory>
    </FundingOppSynopsis>
    <FundingOppSynopsis>
        <FundingOppNumber>OFDA-FY08-002-APS</FundingOppNumber>
        <ApplicationsDueDate>10102008</ApplicationsDueDate>
        <Office>None</Office>
        <Agency>Agency for International Development</Agency>
        <EligibilityCategory>99</EligibilityCategory>
    </FundingOppSynopsis>
    <FundingOppSynopsis>
        <FundingOppNumber>AK-NOI08-0004</FundingOppNumber>
        <ApplicationsDueDate>07142008</ApplicationsDueDate>
        <Office>Bureau of Land Management</Office>
        <Agency>Department of the Interior</Agency>
        <EligibilityCategory>99</EligibilityCategory>
    </FundingOppSynopsis>
    <FundingOppSynopsis>
        <FundingOppNumber>RD-RBP-BIOMASS-2007-FULL</FundingOppNumber>
        <ApplicationsDueDate>11162007</ApplicationsDueDate>
        <Office>Business and Cooperative Programs</Office>
        <Agency>Department of Agriculture</Agency>
        <EligibilityCategory>06</EligibilityCategory>
        <EligibilityCategory>12</EligibilityCategory>
        <EligibilityCategory>13</EligibilityCategory>
        <EligibilityCategory>20</EligibilityCategory>
        <EligibilityCategory>22</EligibilityCategory>
        <EligibilityCategory>23</EligibilityCategory>
        <EligibilityCategory>25</EligibilityCategory>
    </FundingOppSynopsis>
    <FundingOppSynopsis>
        <FundingOppNumber>BAA07-10</FundingOppNumber>
        <ApplicationsDueDateExplanation>The due dates and times established for the receipt of White Papers and Full Proposals are as indicated in Section IV, Paragraph 3 of the BAA. </ApplicationsDueDateExplanation>
        <Office>Office of Procurement Operations - Grants Division</Office>
        <Agency>Department of Homeland Security</Agency>
        <EligibilityCategory>00</EligibilityCategory>
        <EligibilityCategory>01</EligibilityCategory>
        <EligibilityCategory>02</EligibilityCategory>
        <EligibilityCategory>04</EligibilityCategory>
        <EligibilityCategory>05</EligibilityCategory>
        <EligibilityCategory>06</EligibilityCategory>
        <EligibilityCategory>07</EligibilityCategory>
        <EligibilityCategory>08</EligibilityCategory>
        <EligibilityCategory>11</EligibilityCategory>
        <EligibilityCategory>12</EligibilityCategory>
        <EligibilityCategory>13</EligibilityCategory>
        <EligibilityCategory>20</EligibilityCategory>
        <EligibilityCategory>21</EligibilityCategory>
        <EligibilityCategory>22</EligibilityCategory>
        <EligibilityCategory>23</EligibilityCategory>
        <EligibilityCategory>25</EligibilityCategory>
        <EligibilityCategory>99</EligibilityCategory>
    </FundingOppSynopsis>
</Grants>

您可以使用xPath請求來實現您想要的操作。

import xml.etree.ElementTree as etree
tree = etree.parse('sample.xml')
root = tree.getroot()

req = tree.findall("./FundingOppSynopsis[EligibilityCategory='99']")

for r in req:
    print r

我提出的請求返回文檔的所有FundingOppSynopsis元素的列表,這些元素具有標記為EligibilityCategory的子項,其中包含文本“ 99”。

有關xPath請求的更多信息,請點擊此處

關於Python中使用XPath的更多信息這里

您需要使用findall提取類別列表,然后檢查該列表中是否有99。 您可以像這樣使用列表理解

for FundingOppSynopsis in root.findall('FundingOppSynopsis'): 
    IDs = [int(category.text) for category in FundingOppSynopsis.findall('EligibilityCategory')]
    if 99 not in IDs:
        root.remove(FundingOppSynopsis)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM