I am trying to extract some information out of a tei file, using this code:
tree = ET.parse(path)
root = tree.getroot()
body = root.find("{http://www.tei-c.org/ns/1.0}text/{http://www.tei-c.org/ns/1.0}body")
for s in body.iter("{http://www.tei-c.org/ns/1.0}s"):
for w in s.iter("{http://www.tei-c.org/ns/1.0}w"):
wordpart = w.find("{http://www.tei-c.org/ns/1.0}seg")
word = ''.join(wordpart.itertext())
type = w.get('type')
xml = w.get('xml:id')
print(type)
print(xml)
The output for type
is correct, it prints eg "noun". But for xml:id
I can only get None
. This is an extract of the xml-file I need to parse:
<w type="noun" xml:id="w.4940"><seg type="orth">sloterheighe</seg>...
To get the value of the xml:id
attribute, you need to specify the namespace URI like this (see this SO post for more details):
xml = w.attrib['{http://www.w3.org/XML/1998/namespace}id']
or
xml = w.get('{http://www.w3.org/XML/1998/namespace}id')
Also, note that type
is a built-in method in Python, so avoid using it as a variable name.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.