I am trying to use BS4 and I want to print the exact TD tag AUD/AED from the example below. I understand that I could use sometime of parsing like [-1] to always get the last one, but on some of the other data the TD tag I want will be in the middle. Is there a way I can call the AUD/AED tag specially.
Example:
<table class="RESULTS" width="100%">
<tr>
<th align="left">Base Currency</th>
<th align="left">Quote Currency</th>
<th align="left">Instrument</th>
<th align="left">Spot Date</th>
</tr>
<tr>
<td>AUD</td>
<td>AED</td>
<td>AUD/AED</td>
<td>Wednesday 23 APR 2014</td>
</tr>
</table>
Code I am using to get this:
soup = BeautifulSoup(r)
table = soup.find(attrs={"class": "RESULTS"})
print(table)
days = table.find_all('tr')
This will get the last TR tag, but I need to find the TR tag with the TD tag of AUD/AED
I am looking for something like:
if td[2] == <td>AUD/AED</td>:
print(tr[-1])
This sort of thing is much (much) cleaner if you have a CSS selector to go off of, but it looks like we can't do that here.
The next-best alternative is just to explicitly find the tag you want:
soup.find(class_='RESULTS').find(text='AUD/AED')
And then navigate from there using the bs4 API.
tr = soup.find(class_='RESULTS').find(text='AUD/AED').parent.parent
import re
tr.find(text=re.compile(r'\w+ \d{1,2} \w+ \d{4}'))
Out[66]: 'Wednesday 23 APR 2014'
This sort of approach makes no assumptions about the layout of tr
's children, it just looks for siblings of the AUD/AED tag that look like a date (according to regex).
Something like this? Assuming soup
is your table.
cellIndex = 0
cells = soup.find_all('td')
while cellIndex < len(cells):
if cells[cellIndex].text == u'AUD/AED':
desiredIndex = cellIndex + 1
break
cellIndex += 1
if cellIndex != len(cells):
#desiredIndex was found
print(cells[desiredIndex].text)
else:
print("cell not found")
I'd probably use lxml and XPath:
from StringIO import StringIO
from lxml import etree
tree = etree.parse(StringIO(table), etree.HTMLParser())
d = tree.xpath("//table[@class='RESULTS']/tr[./td[3][text()='AUD/AED']]/td[4]/text()")[0]
The variable d
should contain the string " Wednesday 23 APR 2014
".
If you really want BeautifulSoup, you can mix lxml and BeautifulSoup, no problem.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.