简体   繁体   中英

downloading zip files with python mechanize

I am using Python 2.7, mechanize, and beautifulsoup and if it helps I could use urllib

ok, I am trying to download a couple different zip files that are in an different html tables. I know what tables the particular files are in ( I know if they are in the first, second,third ... table)
here is the second table in the html format from the webpage:

<table class="fe-form" cellpadding="0" cellspacing="0" border="0" width="50%">
            <tr>
                <td colspan="2"><h2>Eligibility List</h2></td>
            </tr>


            <tr>
                <td><b>Eligibility File for Met-Ed</b> - 
                <a href="/content/fecorp/supplierservices/eligibility_list.suppliereligibility.html?id=ME&ftype=1&fname=cmb_me_elig_lst_06_2013.zip">cmb_me_elig_lst_06_2013.zip</td>
            </tr>



            <tr>
                <td><b>Eligibility File for Penelec</b> - 
                <a href="/content/fecorp/supplierservices/eligibility_list.suppliereligibility.html?id=PN&ftype=1&fname=cmb_pn_elig_lst_06_2013.zip">cmb_pn_elig_lst_06_2013.zip</td>
            </tr>



            <tr>
                <td><b>Eligibility File for Penn Power</b> - 
                <a href="/content/fecorp/supplierservices/eligibility_list.suppliereligibility.html?id=PP&ftype=1&fname=cmb_pennelig_06_2013.zip">cmb_pennelig_06_2013.zip</td>
            </tr>



            <tr>
                <td><b>Eligibility File for West Penn Power</b> - 
                <a href="/content/fecorp/supplierservices/eligibility_list.suppliereligibility.html?id=WP&ftype=1&fname=cmb_wp_elig_lst_06_2013.zip">cmb_wp_elig_lst_06_2013.zip</td>
            </tr>


            <tr>
                <td>&nbsp;</td>
            </tr>
        </table>

I was going to use the following code just to get to the 2nd table:

from bs4 import BeautifulSoup
html= br.response().read()
soup = BeautifulSoup(html)
table = soup.find("table", class=fe-form)

I guess that class="fe-form" is wrong because it will not work, but there are no other attributes of the table that differentiates it from the other tables. All tables have cellpadding="0" cellspacing="0" border="0" width="50%". I guess I can't use the find() function.

so I am trying to get to the second table and then to download the files on this page. Could someone give me some info to push me in the right direction. I have worked with forms before, but not tables. I wish there was some way to find the find the particular title of the zip files I am looking for then download them since I will always know their names

Thanks for any help, Tom

To select the table you want, simply do

table = soup.find('table', attrs={'class' : 'fe-form', 'cellpadding' : '0' })

This assumes that there is only one table with class=fe-form and cellpadding=0 in your document. If there are more, this code will select only the first table. To be sure you are not overlooking anything on the page, you could do

tables = soup.findAll('table', attrs={'class' : 'fe-form', 'cellpadding' : '0' })
table = tables[0]

And maybe assert that len(tables)==1 to be sure that there is only one table.

Now, to download the file, there is plenty you can do. Assuming from your code that you have loaded mechanize , you could something like

a_tags = table.findAll('a')

for a in a_tags:
  if '.zip' in a.get('href'):
    br.retrieve(a.get('href'), a.text)

That would download all files to your current working directory and would name them according to their link text.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM