简体   繁体   中英

Get all children of parent with specific tag in elementtree

I am trying to find all children including values, not just tag, given the parent node SPEECH.

<SPEECH>
   <SPEAKER>PHILO</SPEAKER>
   <LINE>Nay, but this dotage of our general's</LINE>
   <LINE>O'erflows the measure: those his goodly eyes,</LINE>
   <LINE>That o'er the files and musters of the war</LINE>
   <LINE>Have glow'd like plated Mars, now bend, now turn,</LINE>
   <LINE>The office and devotion of their view</LINE>
   <LINE>Upon a tawny front: his captain's heart,</LINE>
   <LINE>Which in the scuffles of great fights hath burst</LINE>
   <LINE>The buckles on his breast, reneges all temper,</LINE>
   <LINE>And is become the bellows and the fan</LINE>
   <LINE>To cool a gipsy's lust.</LINE>
   <STAGEDIR>Flourish. Enter ANTONY, CLEOPATRA, her Ladies,
   the Train, with Eunuchs fanning her</STAGEDIR>
   <LINE>Look, where they come:</LINE>
   <LINE>Take but good note, and you shall see in him.</LINE>
   <LINE>The triple pillar of the world transform'd</LINE>
   <LINE>Into a strumpet's fool: behold and see.</LINE>
</SPEECH>

This is what I have right now

tree_a_and_c = ET.parse('shakespeare/a_and_c.xml')
root_a_and_c = tree_a_and_c.getroot()

a_and_c_corpus = []

for child in root_a_and_c:
    for child1 in child:
        for child2 in child1:
            for child3 in child2:
                a_and_c_corpus.append(child3)

print(a_and_c_corpus)

Output

[<Element 'SPEAKER' at 0x1280a3510>, <Element 'LINE' at 0x1280a23e0>, <Element 'LINE' at 0x1280a27f0>, <Element 'LINE' at 0x1280a1120>, <Element 'LINE' at 0x1280a32e0>, <Element 'LINE' at 0x1280a2c00>, <Element 'LINE' at 0x1280a3ab0>, <Element 'LINE' at 0x1280a3100>, <Element 'LINE' at 0x1280a3060>, <Element 'LINE' at 0x1280a3420>, 

The problem is that I want to iterate through all SPEECH and compare the element SPEAKER to a name, if the name corresponds I want to append all LINE to a list. Ie, I wish to either split the list into lists for each SPEAKER, or somehow findall(parent) and then find the values of that parent's children. How can I do this?

Although you tagged this question with ElementTree, I would use lxml, given its better support of xpath.

So I would suggest this;

from lxml import etree

Change your first line:

tree_a_and_c = ET.parse('shakespeare/a_and_c.xml')

to

tree_a_and_c = etree.XML('shakespeare/a_and_c.xml')

and continue this way:

#create a dictionary, where the key is each speaker's name and the value is a list of all the speaker's lines
a_and_c_corpus = {}

#get all unique speakers
speakers = set(root.xpath('.//SPEAKER/text()'))

#now update the dictionary
for speaker in speakers:
    a_and_c_corpus[speaker] = root.xpath((f'//SPEECH[./SPEAKER[.="{speaker}"]]//LINE/text()'))

for sp in a_and_c_corpus.items():
    print(sp)

Output

('VENTIDIUS', ['Now, darting Parthia, art thou struck; and now', "Pleased fortune does of Marcus Crassus' death", "Make me revenger. Bear the king's son's body", 'Before our army. Thy Pacorus, Orodes,', ...])
--------
('Soothsayer', ['Your will?', "In nature's infinite book of secrecy", 'A little I can read.', 'I make not, but foresee.', 'You shall be yet far fairer than you are.', ....])
--------
('CANIDIUS', ['Why will my lord do so?', 'Ay, and to wage this battle at Pharsalia.', 'Where Caesar fought with Pompey: but these offers,', 'Which serve not for his vantage, be shakes off;', ....])
--------

etc.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM