Python Beautiful Soup Print exact TD tag

Question

I am trying to use BS4 and I want to print the exact TD tag AUD/AED from the example below. I understand that I could use sometime of parsing like [-1] to always get the last one, but on some of the other data the TD tag I want will be in the middle. Is there a way I can call the AUD/AED tag specially.

Example:

<table class="RESULTS" width="100%">
<tr>
<th align="left">Base Currency</th>
<th align="left">Quote Currency</th>
<th align="left">Instrument</th>
<th align="left">Spot Date</th>
</tr>
<tr>
<td>AUD</td>
<td>AED</td>
<td>AUD/AED</td>
<td>Wednesday 23 APR 2014</td>
</tr>
</table>

Code I am using to get this:

soup = BeautifulSoup(r)
table = soup.find(attrs={"class": "RESULTS"})
print(table)
days = table.find_all('tr')

This will get the last TR tag, but I need to find the TR tag with the TD tag of AUD/AED

I am looking for something like:

if td[2] == <td>AUD/AED</td>:
    print(tr[-1])

Answer 1

This sort of thing is much (much) cleaner if you have a CSS selector to go off of, but it looks like we can't do that here.

The next-best alternative is just to explicitly find the tag you want:

soup.find(class_='RESULTS').find(text='AUD/AED')

And then navigate from there using the bs4 API.

tr = soup.find(class_='RESULTS').find(text='AUD/AED').parent.parent

import re

tr.find(text=re.compile(r'\w+ \d{1,2} \w+ \d{4}'))
Out[66]: 'Wednesday 23 APR 2014'

This sort of approach makes no assumptions about the layout of tr 's children, it just looks for siblings of the AUD/AED tag that look like a date (according to regex).

Answer 2

Something like this? Assuming soup is your table.

cellIndex = 0
cells = soup.find_all('td')
while cellIndex < len(cells):
    if cells[cellIndex].text == u'AUD/AED':
        desiredIndex = cellIndex + 1
        break
    cellIndex += 1
if cellIndex != len(cells):
     #desiredIndex was found
     print(cells[desiredIndex].text)
else:
     print("cell not found")

Answer 3

I'd probably use lxml and XPath:

from StringIO import StringIO
from lxml import etree

tree = etree.parse(StringIO(table), etree.HTMLParser())
d = tree.xpath("//table[@class='RESULTS']/tr[./td[3][text()='AUD/AED']]/td[4]/text()")[0]

The variable d should contain the string " Wednesday 23 APR 2014 ".

If you really want BeautifulSoup, you can mix lxml and BeautifulSoup, no problem.

Python Beautiful Soup Print exact TD tag

Question

3 answers

solution1
1 ACCPTED 2014-04-17 18:20:39

solution2
0 2014-04-17 18:18:26

solution3
0 2014-04-17 18:26:17

Python Beautiful Soup Print exact TD tag

Question

3 answers

solution1 1 ACCPTED 2014-04-17 18:20:39

solution2 0 2014-04-17 18:18:26

solution3 0 2014-04-17 18:26:17

solution1
1 ACCPTED 2014-04-17 18:20:39

solution2
0 2014-04-17 18:18:26

solution3
0 2014-04-17 18:26:17