简体   繁体   中英

What is the pythonic way of downloading hdf5 files from http server?

I am trying to download a hdf5 file from http server. I can do this with python subprocess module and wget, but I feel like I am cheating

    # wget solution
    import subprocess
    url = 'http://url/to/file.h5' 
    subprocess(['wget', '--proxy=off', url])

I can also use the urllib and request modules for downloading images like this:

    # requests solution
    url2 = 'http://url/to/image.png'
    r = requests.get(url2)
    with open('image.png', 'wb') as img:
    img.write(r.content)

    # urllib solution
    urllib.urlretrieve(url2, 'outfile.png')

However, when I try to download hdf5-file with this method and run shell command 'file' I get:

    >file test.h5 
    >test.h5: HTML document, ASCII text, with very long lines

Here is the header from requests.get() (not sure if it helps)

    {'accept-ranges': 'bytes',
    'content-length': '413399',
    'date': 'Tue, 19 Feb 2013 08:51:06 GMT',
    'etag': 'W/"413399-1361177055000"',
    'last-modified': 'Mon, 18 Feb 2013 08:44:15 GMT',
    'server': 'Apache-Coyote/1.1'}

Should I just use wget throug subprocess or is there a pythonic solution?

Solution: The problem was caused by the fact that I didn't disable the proxy before I tried to download the file and because of that, the transfer was intercepted. This piece of code did the trick.

    import urllib2
    proxy_handler = urllib2.ProxyHandler({})
    opener = urllib2.build_opener(proxy_handler)
    urllib2.install_opener(opener)

    url = 'http://url/to/file.h5'

    req = urllib2.Request(url)
    r = opener.open(req)
    result = r.read()

    with open('my_file.h5', 'wb') as f:
        f.write(result)

尝试使用urllib.geturl获取真实的URL(跟随重定向),然后将其传递给urlretrieve

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM