Python BeautifulSoup find all tags under a certain type of tag

Question

I have an XML file with nested tags as such:

<tag1>
  <tag2>text0</tag2>
  text1
  <tag2>text2</tag2>
</tag1>
<tag2>text3</tag2>
text4
<tag1>
  <tag2>text5</tag2>
</tag1>

Where I want to get all the content of all the tag2 , but only if they are contained within a tag1 . So in this example: text0 , text2 and text5 . And not text1 .

I'm currently doing this in a double for loop. But files to come will have multiple levels and I want to avoid nesting many for loops.

Here is my code:

tag1entries = soup.find_all('tag1')
for tag1entry in tag1entries:
  tag2entries = tag1entry.find_all('tag2')
  for tag2entry in tag2entries:
    do_something(tag2entry.contents)

Does anyone know a better way?

Answer 1

You can use CSS selector , for example, to select tag2 that is direct child of tag1 :

tag2entries = soup.select('tag1 > tag2')

or, to select tag2 anywhere within tag1 :

tag2entries = soup.select('tag1  tag2')

Answer 2

You could use a list comprehension:

entry_list = [entry.text for entry in soup.find_all('tag2') if entry.parent.name == 'tag1']

which results in:

['text0', 'text2', 'text5']

Python BeautifulSoup find all tags under a certain type of tag

Question

2 answers

solution1
2 ACCPTED 2016-05-31 09:25:03

solution2
0 2016-05-31 09:32:10

Python BeautifulSoup find all tags under a certain type of tag

Question

2 answers

solution1 2 ACCPTED 2016-05-31 09:25:03

solution2 0 2016-05-31 09:32:10

solution1
2 ACCPTED 2016-05-31 09:25:03

solution2
0 2016-05-31 09:32:10