This is probably an easy question, but I'd like to iterate through the tags with id = dgrdAcquired_hyplnkacquired_0, dgrdAcquired_hyplnkacquired_1, etc.
Is there any easier way to do this than the code I have below? The trouble is that the number of these tags will be different for each webpage I pull up. I'm not sure how to get the text in these tags when each webpage might have a different number of tags.
html = """
<tr>
<td colspan="3"><table class="datagrid" cellspacing="0" cellpadding="3" rules="rows" id="dgrdAcquired" width="100%">
<tr class="datagridH">
<th scope="col"><font face="Arial" color="Blue" size="2"><b>Name (RSSD ID)</b></font></th><th scope="col"><font face="Arial" color="Blue" size="2"><b>Acquisition Date</b></font></th><th scope="col"><font face="Arial" color="Blue" size="2"><b>Description</b></font></th>
</tr><tr class="datagridI">
<td nowrap="nowrap"><font face="Arial" size="2">
<a id="dgrdAcquired_hyplnkacquired_0" href="InstitutionProfile.aspx?parID_RSSD=3557617&parDT_END=20110429">FIRST CHOICE COMMUNITY BANK (3557617)</a>
</font></td><td><font face="Arial" size="2">
<span id="dgrdAcquired_lbldtAcquired_0">2011-04-30</span>
</font></td><td><font face="Arial" size="2">
<span id="dgrdAcquired_lblAcquiredDescText_0">The acquired institution failed and disposition was arranged of by a regulatory agency. Assets were distributed to the acquiring institution.</span>
</font></td>
</tr><tr class="datagridAI">
<td nowrap="nowrap"><font face="Arial" size="2">
<a id="dgrdAcquired_hyplnkacquired_1" href="InstitutionProfile.aspx?parID_RSSD=104038&parDT_END=20110429">PARK AVENUE BANK, THE (104038)</a>
</font></td>
"""
soup = BeautifulSoup(html)
firm1 = soup.find('a', { "id" : "dgrdAcquired_hyplnkacquired_0"})
data1 = ''.join(firm1.findAll(text=True))
print data1
firm2 = soup.find('a', { "id" : "dgrdAcquired_hyplnkacquired_1"})
data2 = ''.join(firm2.findAll(text=True))
print data2
I would do the following, assuming that if there are n
such tags, they are numbered 0...n
:
soup = BeautifulSoup(html)
i = 0
data = []
while True:
firm1 = soup.find('a', { "id" : "dgrdAcquired_hyplnkacquired_%s" % i})
if not firm1:
break
data.append(''.join(firm1.findAll(text=True)))
print data[-1]
i += 1
Regex is probably overkill in this particular case.
Nonetheless here's another option:
import re
soup.find_all('a', id=re.compile(r'[dgrdAcquired_hyplnkacquired_]\d+'))
Please note : s/find_all/findAll/g
if using BS3.
Result (a bit of whitespace removed for purposes of display):
[<a href="InstitutionProfile.aspx?parID_RSSD=3557617&parDT_END=20110429"
id="dgrdAcquired_hyplnkacquired_0">FIRST CHOICE COMMUNITY BANK (3557617)</a>,
<a href="InstitutionProfile.aspx?parID_RSSD=104038&parDT_END=20110429"
id="dgrdAcquired_hyplnkacquired_1">PARK AVENUE BANK, THE (104038)</a>]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.