简体   繁体   中英

Looking for a way to count the number of XML element appearances up to a certain point

I am new to python and xml, so maybe I'm not using the correct terms to find what I need but I looked around for a while on stackoverflow, and also tried reading the documentation for dom and mini-dom, and could not find anything.

<AppName>
    <author>Person 1</author>
        <out>Output 1</out>
        <out>Output 2</out>
        <out>Output 3</out>
    <description> Description</description>
    <date>2012-11-06</date>
</AppName>
<AppName>
    <author>Person 2</author>
        <out>Output 1</out>
        <out>Output 2</out>
        <out>Output 3</out>
        <out>Output 4</out>
    <description> Description</description>
    <date>2012-11-06</date>        
</AppName>
    ...
  countinues for 500 AppNames

So I am trying to pair up the information to write to a file where

Person1 || Output1
Person1 || Output2
Person1 || Output3
Person2 || Output1
Person2 || Output2
Person2 || Output3
    etc...

But when I use minidom to read from a file

dom = xml.dom.minidom.parse(filename)
authorList = dom.getElementsByTagName('author')
outList = dom.getElementsByTagName('out")

I don't know how to effectively pair them up, since the element varies from author to author and I don't know how to count how many there are to a specific author. I'm currently writing it

text_file = open ("author.txt", "w")
for i in range(0, len(authorList)):
    text_file.write(authorList.__getitem__(i).firstChild.nodeValue)
    text_file.write(" || ")
    text_file.write(outList._getitem_(i).firstChild.nodeValue)

text_file.close()

Which is clearly incorrect but I can't figure out how to pair them up without finding out the number of occurrences on the elements specific to the author, so any help to do that, or other possible solutions to achieve the desired results are welcomed.

I already looked at the documentation on Dom and minidom and I know you can

len(dom.getElementsByTagName('out'))

but this would only give me the total number of out in the whole xmlfile.

Any pointers/tip would be strongly appreciated.

The trick here is that you want to parse each AppName as a unit, as the author and their out elements are siblings. I'd do something along the lines of:

dom = xml.dom.minidom.parse(filename)
AppNames = dom.getElementsByTagName('AppName')

with open("author.txt", "w") as text_file:
    for AppName in AppNames:
        authorName = AppName.getElementsByTagName("author")[0].nodeValue
        works = AppName.getElementsByTagName("out")
        for work in works:
            workTitle = work.nodeValue
            test_file.write("{} || {}").format(authorName, work)

I haven't tested this, and I tend to use elementTree when dealing with XML, so the syntax above may not be 100% right.

ps The general convention in python is that methods/functions with leading underscores are private, and not meant to be accessed directly

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM