How to get a nested element in beautiful soup

Question

I am struggling with the syntax required to grab some hrefs in a td. The table, tr and td elements dont have any class's or id's.

If I wanted to grab the anchor in this example, what would I need?

< tr > < td > < a >...

Thanks

Answer 1

As per the docs, you first make a parse tree:

import BeautifulSoup
html = "<html><body><tr><td><a href='foo'/></td></tr></body></html>"
soup = BeautifulSoup.BeautifulSoup(html)

and then you search in it, for example for <a> tags whose immediate parent is a <td> :

for ana in soup.findAll('a'):
  if ana.parent.name == 'td':
    print ana["href"]

Answer 2

Something like this?

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(html)
anchors = [td.find('a') for td in soup.findAll('td')]

That should find the first "a" inside each "td" in the html you provide. You can tweak td.find to be more specific or else use findAll if you have several links inside each td.

UPDATE: re Daniele's comment, if you want to make sure you don't have any None 's in the list, then you could modify the list comprehension thus:

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(html)
anchors = [a for a in (td.find('a') for td in soup.findAll('td')) if a]

Which basically just adds a check to see if you have an actual element returned by td.find('a') .

How to get a nested element in beautiful soup

Question

2 answers

solution1
27 ACCPTED 2009-06-29 14:37:27

solution2
24 2009-06-29 14:37:15

How to get a nested element in beautiful soup

Question

2 answers

solution1 27 ACCPTED 2009-06-29 14:37:27

solution2 24 2009-06-29 14:37:15

solution1
27 ACCPTED 2009-06-29 14:37:27

solution2
24 2009-06-29 14:37:15