简体   繁体   中英

Python BeautifulSoup find all tags under a certain type of tag

I have an XML file with nested tags as such:

<tag1>
  <tag2>text0</tag2>
  text1
  <tag2>text2</tag2>
</tag1>
<tag2>text3</tag2>
text4
<tag1>
  <tag2>text5</tag2>
</tag1>

Where I want to get all the content of all the tag2 , but only if they are contained within a tag1 . So in this example: text0 , text2 and text5 . And not text1 .

I'm currently doing this in a double for loop. But files to come will have multiple levels and I want to avoid nesting many for loops.

Here is my code:

tag1entries = soup.find_all('tag1')
for tag1entry in tag1entries:
  tag2entries = tag1entry.find_all('tag2')
  for tag2entry in tag2entries:
    do_something(tag2entry.contents)

Does anyone know a better way?

You can use CSS selector , for example, to select tag2 that is direct child of tag1 :

tag2entries = soup.select('tag1 > tag2')

or, to select tag2 anywhere within tag1 :

tag2entries = soup.select('tag1  tag2')

You could use a list comprehension:

entry_list = [entry.text for entry in soup.find_all('tag2') if entry.parent.name == 'tag1']

which results in:

['text0', 'text2', 'text5']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM