I have an XML file with nested tags as such:
<tag1>
<tag2>text0</tag2>
text1
<tag2>text2</tag2>
</tag1>
<tag2>text3</tag2>
text4
<tag1>
<tag2>text5</tag2>
</tag1>
Where I want to get all the content of all the tag2
, but only if they are contained within a tag1
. So in this example: text0
, text2
and text5
. And not text1
.
I'm currently doing this in a double for loop. But files to come will have multiple levels and I want to avoid nesting many for loops.
Here is my code:
tag1entries = soup.find_all('tag1')
for tag1entry in tag1entries:
tag2entries = tag1entry.find_all('tag2')
for tag2entry in tag2entries:
do_something(tag2entry.contents)
Does anyone know a better way?
You can use CSS selector , for example, to select tag2
that is direct child of tag1
:
tag2entries = soup.select('tag1 > tag2')
or, to select tag2
anywhere within tag1
:
tag2entries = soup.select('tag1 tag2')
You could use a list comprehension:
entry_list = [entry.text for entry in soup.find_all('tag2') if entry.parent.name == 'tag1']
which results in:
['text0', 'text2', 'text5']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.