I am trying to find all children including values, not just tag, given the parent node SPEECH.
<SPEECH>
<SPEAKER>PHILO</SPEAKER>
<LINE>Nay, but this dotage of our general's</LINE>
<LINE>O'erflows the measure: those his goodly eyes,</LINE>
<LINE>That o'er the files and musters of the war</LINE>
<LINE>Have glow'd like plated Mars, now bend, now turn,</LINE>
<LINE>The office and devotion of their view</LINE>
<LINE>Upon a tawny front: his captain's heart,</LINE>
<LINE>Which in the scuffles of great fights hath burst</LINE>
<LINE>The buckles on his breast, reneges all temper,</LINE>
<LINE>And is become the bellows and the fan</LINE>
<LINE>To cool a gipsy's lust.</LINE>
<STAGEDIR>Flourish. Enter ANTONY, CLEOPATRA, her Ladies,
the Train, with Eunuchs fanning her</STAGEDIR>
<LINE>Look, where they come:</LINE>
<LINE>Take but good note, and you shall see in him.</LINE>
<LINE>The triple pillar of the world transform'd</LINE>
<LINE>Into a strumpet's fool: behold and see.</LINE>
</SPEECH>
This is what I have right now
tree_a_and_c = ET.parse('shakespeare/a_and_c.xml')
root_a_and_c = tree_a_and_c.getroot()
a_and_c_corpus = []
for child in root_a_and_c:
for child1 in child:
for child2 in child1:
for child3 in child2:
a_and_c_corpus.append(child3)
print(a_and_c_corpus)
Output
[<Element 'SPEAKER' at 0x1280a3510>, <Element 'LINE' at 0x1280a23e0>, <Element 'LINE' at 0x1280a27f0>, <Element 'LINE' at 0x1280a1120>, <Element 'LINE' at 0x1280a32e0>, <Element 'LINE' at 0x1280a2c00>, <Element 'LINE' at 0x1280a3ab0>, <Element 'LINE' at 0x1280a3100>, <Element 'LINE' at 0x1280a3060>, <Element 'LINE' at 0x1280a3420>,
The problem is that I want to iterate through all SPEECH and compare the element SPEAKER to a name, if the name corresponds I want to append all LINE to a list. Ie, I wish to either split the list into lists for each SPEAKER, or somehow findall(parent) and then find the values of that parent's children. How can I do this?
Although you tagged this question with ElementTree, I would use lxml, given its better support of xpath.
So I would suggest this;
from lxml import etree
Change your first line:
tree_a_and_c = ET.parse('shakespeare/a_and_c.xml')
to
tree_a_and_c = etree.XML('shakespeare/a_and_c.xml')
and continue this way:
#create a dictionary, where the key is each speaker's name and the value is a list of all the speaker's lines
a_and_c_corpus = {}
#get all unique speakers
speakers = set(root.xpath('.//SPEAKER/text()'))
#now update the dictionary
for speaker in speakers:
a_and_c_corpus[speaker] = root.xpath((f'//SPEECH[./SPEAKER[.="{speaker}"]]//LINE/text()'))
for sp in a_and_c_corpus.items():
print(sp)
Output
('VENTIDIUS', ['Now, darting Parthia, art thou struck; and now', "Pleased fortune does of Marcus Crassus' death", "Make me revenger. Bear the king's son's body", 'Before our army. Thy Pacorus, Orodes,', ...])
--------
('Soothsayer', ['Your will?', "In nature's infinite book of secrecy", 'A little I can read.', 'I make not, but foresee.', 'You shall be yet far fairer than you are.', ....])
--------
('CANIDIUS', ['Why will my lord do so?', 'Ay, and to wage this battle at Pharsalia.', 'Where Caesar fought with Pompey: but these offers,', 'Which serve not for his vantage, be shakes off;', ....])
--------
etc.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.