简体   繁体   中英

How do I get the xml:id of an element using ElementTree in python

I'm sorry, if that is a really basic questions, but I'm sitting in front of that problem for hours already and just can't make it work.

I'm working with the British National Corpus (which files are in XML-format) and I want to extract the attributes of different persons in those files. The part I'm working with is structured like this:

<bncDoc>
<teiHeader>
<profileDesc>
<particDesc n="C196">
                <person ageGroup="X" xml:id="PS21Y" role="unspecified" sex="f" soc="UU" dialect="NONE" firstLang="EN-GBR" educ="X">
                    <persName>j. hammond</persName>
                    <occupation>interviewer</occupation>
                </person>
                <person ageGroup="X" xml:id="PS220" role="unspecified" sex="m" soc="UU" dialect="XIS" firstLang="EN-GBR" educ="X">
                    <persName>Bhagan</persName>
                </person>
</particDesc>
</profileDesc>
</teiHeader>
</bncDoc>

I'm trying to extract "id", "sex", "soc", and "ageGroup" of the "person" elements. But I just don't know how it works with those "xml:id"'s. The way I'm trying to do it (like shown below), doesn't work. It works for "sex", "soc", and "ageGroup", but not for "xml:id". Does anyone know, how to make it work? That would help me a lot: :)

for i in root.findall('teiHeader/profileDesc/particDesc/person'):
            tmp = []
            tmp.append(i.get('id'))
            tmp.append(i.get('sex'))
            tmp.append(i.get('soc'))
            tmp.append(i.get('ageGroup'))

It works if you use

i.get('{http://www.w3.org/XML/1998/namespace}id')

This looks a bit ugly, but it has to do with the fact that xml: is a special namespace prefix that is bound to the http://www.w3.org/XML/1998/namespace URI. See https://www.w3.org/XML/1998/namespace .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM