I am using Python 2.7, mechanize, and beautifulsoup and if it helps I could use urllib
ok, I am trying to download a couple different zip files that are in an different html tables. I know what tables the particular files are in ( I know if they are in the first, second,third ... table)
here is the second table in the html format from the webpage:
<table class="fe-form" cellpadding="0" cellspacing="0" border="0" width="50%">
<tr>
<td colspan="2"><h2>Eligibility List</h2></td>
</tr>
<tr>
<td><b>Eligibility File for Met-Ed</b> -
<a href="/content/fecorp/supplierservices/eligibility_list.suppliereligibility.html?id=ME&ftype=1&fname=cmb_me_elig_lst_06_2013.zip">cmb_me_elig_lst_06_2013.zip</td>
</tr>
<tr>
<td><b>Eligibility File for Penelec</b> -
<a href="/content/fecorp/supplierservices/eligibility_list.suppliereligibility.html?id=PN&ftype=1&fname=cmb_pn_elig_lst_06_2013.zip">cmb_pn_elig_lst_06_2013.zip</td>
</tr>
<tr>
<td><b>Eligibility File for Penn Power</b> -
<a href="/content/fecorp/supplierservices/eligibility_list.suppliereligibility.html?id=PP&ftype=1&fname=cmb_pennelig_06_2013.zip">cmb_pennelig_06_2013.zip</td>
</tr>
<tr>
<td><b>Eligibility File for West Penn Power</b> -
<a href="/content/fecorp/supplierservices/eligibility_list.suppliereligibility.html?id=WP&ftype=1&fname=cmb_wp_elig_lst_06_2013.zip">cmb_wp_elig_lst_06_2013.zip</td>
</tr>
<tr>
<td> </td>
</tr>
</table>
I was going to use the following code just to get to the 2nd table:
from bs4 import BeautifulSoup
html= br.response().read()
soup = BeautifulSoup(html)
table = soup.find("table", class=fe-form)
I guess that class="fe-form" is wrong because it will not work, but there are no other attributes of the table that differentiates it from the other tables. All tables have cellpadding="0" cellspacing="0" border="0" width="50%". I guess I can't use the find() function.
so I am trying to get to the second table and then to download the files on this page. Could someone give me some info to push me in the right direction. I have worked with forms before, but not tables. I wish there was some way to find the find the particular title of the zip files I am looking for then download them since I will always know their names
Thanks for any help, Tom
To select the table you want, simply do
table = soup.find('table', attrs={'class' : 'fe-form', 'cellpadding' : '0' })
This assumes that there is only one table with class=fe-form and cellpadding=0 in your document. If there are more, this code will select only the first table. To be sure you are not overlooking anything on the page, you could do
tables = soup.findAll('table', attrs={'class' : 'fe-form', 'cellpadding' : '0' })
table = tables[0]
And maybe assert that len(tables)==1 to be sure that there is only one table.
Now, to download the file, there is plenty you can do. Assuming from your code that you have loaded mechanize
, you could something like
a_tags = table.findAll('a')
for a in a_tags:
if '.zip' in a.get('href'):
br.retrieve(a.get('href'), a.text)
That would download all files to your current working directory and would name them according to their link text.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.