簡體   English   中英

使用LXML和XPath搜索XML表

[英]Searching XML tables using LXML & XPath

試圖找到一種在Python中導航XML文件的更好方法。 目前有這樣的東西。

import xml.etree.ElementTree as ET
tree = ET.parse('file.xml')
root = tree.getroot()
TagSlug = '{http://www.aec.gov.au/xml/schema/mediafeed}'
for child in root:
DID = child.find('PollingDistrictIdentifier')
for grandchild in child.getchildren():
    Name = grandchild.find('TagSlug+Name')
    for grandgrandchild in grandchild.getchildren():
        for grandgrandgrandchild in grandgrandchild.getchildren():    
            PP = grandgrandchild.find(TagSlug+'PollingPlaceIdentifier')
            print(PP.attrib['Id'], PP.attrib['Name'], DID.attrib['Id'], Name.text)

XML的結構類似於以下內容。

<PollingDistrictList Created="2018-10-30T12:01:21.043" xmlns="http://www.aec.gov.au/xml/schema/mediafeed" xmlns:eml="urn:oasis:names:tc:evs:schema:eml" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:xal="urn:oasis:names:tc:ciq:xsdschema:xAL:2.0" xmlns:xnl="urn:oasis:names:tc:ciq:xsdschema:xNL:2.0" xmlns:ts="urn:oasis:names:tc:evs:schema:eml:ts" xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">
<TransactionId>4C59F7F3-2405-4443-8A1F-3F2BEF6E07C4</TransactionId>
<eml:EventIdentifier Id="12122">
<eml:EventName>State Election 2018</eml:EventName>
</eml:EventIdentifier>
<PollingDistrict>
<PollingDistrictIdentifier Id="10153">
<Name>Albert Park District</Name>
</PollingDistrictIdentifier>
<PollingPlaces>
<PollingPlace>
<PollingPlaceIdentifier Id="13133" Name="Bridport" />
<WheelchairAccess>None</WheelchairAccess>
</PollingPlace>
<PollingPlace>
<PollingPlaceIdentifier Id="13987" Name="Kerferd South" />
<WheelchairAccess>None</WheelchairAccess>
</PollingPlace>
<PollingPlaceIdentifier Id="13504" Name="Middle Park" />
<WheelchairAccess>None</WheelchairAccess>
</PollingPlace>
</PollingDistrict>
<PollingDistrict>
<PollingDistrictIdentifier = ....
et cetera

我正在嘗試打印一個投票地點ID,投票地點名稱,地區ID和地區名稱的列表,但在最后部分方面很麻煩。 我嘗試了幾種不同的方法,這是一些方法:

a = tree.findall('./PollingDistrictList/PollingDistrict/PollingPlaces/PollingPlace')
print(a.text)


a = tree.findall('.//PollingPlace')
print(a.text)

我最終收到“ Nonetype”或“ list”沒有屬性“ text”的錯誤,如果刪除了“ .text”,我什么也得不到。 我正在尋找一種更好的導航XML文件的方法,而不是執行此遞歸“根中的子級”操作。

理想情況下,我會得到:

[PP1Id], [PP1Name], [District1Id], [District1Name]
[PP2Id], [PP2Name], [District1Id], [District1Name]
...
[PP1Id], [PP1Name], [District2Id], [District2Name]
etc

任何意見,將不勝感激。

使用以下解決此問題。 帶注釋,因此您可以看到它在做什么。

    import os ###Required to change directory
    os.chdir('C:/XMLDataLocation') ###Set directory
    import lxml
    from lxml import etree
    import xml.etree.ElementTree as ET ###Will parse xml
    import requests ###Requests will be used for the VEC site, not utilised at this stage
    tree = ET.parse('State2018MediaFilePollingLocations.xml') ###Loads file
    root = tree.getroot()


    TagSlug = '{http://www.aec.gov.au/xml/schema/mediafeed}' #This is pre-appended all nodes so saves space

    PollingDistricts = root.findall(TagSlug+'PollingDistrict') #Goes from level 0 (root) to level 1 (PollingDistrict)
    for PollingDistrict in PollingDistricts: #Required otherwise only the first district would display
        DistrictID = PollingDistrict.find(TagSlug+'PollingDistrictIdentifier') #Finds the district ID
        Name = DistrictID.find(TagSlug+'Name') #Finds the name of each electorate (as a child of DistrictID)
        PollingPlaces = PollingDistrict.find(TagSlug+'PollingPlaces') 
        PollingPlace = PollingPlaces.find(TagSlug+'PollingPlace') #These two lines are ONLY for navigating the XML file
        for PollingPlace in PollingPlaces: #Required otherwise it would only print the first booth in each electorate
            PPID = PollingPlace.find(TagSlug+'PollingPlaceIdentifier') #Finds both the booth ID and name
            print(PPID.attrib['Id'], PPID.attrib['Name'], DistrictID.attrib['Id'], Name.text) #Prints the text

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM