简体   繁体   中英

How can I get the file size from a link without downloading it in python?

I have a list of links that I am trying to get the size of to determine how much computational resources each file need. Is it possible to just get the file size with a get request or something similar?

Here is an example of one of the links: https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/005150/SRR5273887

Thanks

If you're using Python 3, you can do it using urlopen from urllib.request :

from urllib.request import urlopen
link =  "https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/005150/SRR5273887"
site = urlopen(link)
meta = site.info()
print(meta)

This will output:

Server: nginx
Date: Mon, 18 Mar 2019 17:02:40 GMT
Content-Type: application/octet-stream
Content-Length: 578220087
Last-Modified: Tue, 21 Feb 2017 12:13:19 GMT
Connection: close
Accept-Ranges: bytes

The Content-Length property is the size of your file in bytes.

You need to use the HEAD method. The example uses requests ( pip install requests ).

#!/usr/bin/env python
# display size of remote file without downloading

import sys
import requests

# pass URL as first argument
response = requests.head(sys.argv[1], allow_redirects=True)

size = response.headers.get('content-length', -1)

# print size in megabytes
print('\t{:<40}: {:.2f} MB'.format('FILE SIZE', int(size) / float(1 << 20)))

Also see How do you send a HEAD HTTP request in Python 2? if you want standard-library based solutions.

To do this use the HTTP HEAD method which just grabs the header information for the URL and doesn't download the content like an HTTP GET request does.

$curl -I https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/005150/SRR5273887
HTTP/1.1 200 OK
Server: nginx
Date: Mon, 18 Mar 2019 16:56:35 GMT
Content-Type: application/octet-stream
Content-Length: 578220087
Last-Modified: Tue, 21 Feb 2017 12:13:19 GMT
Connection: keep-alive
Accept-Ranges: bytes

The file size is in the 'Content-Length' header. In Python 3.6:

>>> import urllib
>>> req = urllib.request.Request('https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/005150/SRR5273887', 
                                 method='HEAD')
>>> f = urllib.request.urlopen(req)
>>> f.status
200
>>> f.headers['Content-Length']
'578220087'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM