How can I get the file size from a link without downloading it in python?

Question

I have a list of links that I am trying to get the size of to determine how much computational resources each file need. Is it possible to just get the file size with a get request or something similar?

Here is an example of one of the links: https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/005150/SRR5273887

Thanks

Answer 1

If you're using Python 3, you can do it using urlopen from urllib.request :

from urllib.request import urlopen
link =  "https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/005150/SRR5273887"
site = urlopen(link)
meta = site.info()
print(meta)

This will output:

Server: nginx
Date: Mon, 18 Mar 2019 17:02:40 GMT
Content-Type: application/octet-stream
Content-Length: 578220087
Last-Modified: Tue, 21 Feb 2017 12:13:19 GMT
Connection: close
Accept-Ranges: bytes

The Content-Length property is the size of your file in bytes.

Answer 2

You need to use the HEAD method. The example uses requests ( pip install requests ).

#!/usr/bin/env python
# display size of remote file without downloading

import sys
import requests

# pass URL as first argument
response = requests.head(sys.argv[1], allow_redirects=True)

size = response.headers.get('content-length', -1)

# print size in megabytes
print('\t{:<40}: {:.2f} MB'.format('FILE SIZE', int(size) / float(1 << 20)))

Also see How do you send a HEAD HTTP request in Python 2? if you want standard-library based solutions.

Answer 3

To do this use the HTTP HEAD method which just grabs the header information for the URL and doesn't download the content like an HTTP GET request does.

$curl -I https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/005150/SRR5273887
HTTP/1.1 200 OK
Server: nginx
Date: Mon, 18 Mar 2019 16:56:35 GMT
Content-Type: application/octet-stream
Content-Length: 578220087
Last-Modified: Tue, 21 Feb 2017 12:13:19 GMT
Connection: keep-alive
Accept-Ranges: bytes

The file size is in the 'Content-Length' header. In Python 3.6:

>>> import urllib
>>> req = urllib.request.Request('https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/005150/SRR5273887', 
                                 method='HEAD')
>>> f = urllib.request.urlopen(req)
>>> f.status
200
>>> f.headers['Content-Length']
'578220087'

How can I get the file size from a link without downloading it in python?

Question

3 answers

solution1
1 2019-03-18 17:03:42

solution2
1 2019-03-18 17:10:29

solution3
1 ACCPTED 2019-03-18 17:11:49

How can I get the file size from a link without downloading it in python?

Question

3 answers

solution1 1 2019-03-18 17:03:42

solution2 1 2019-03-18 17:10:29

solution3 1 ACCPTED 2019-03-18 17:11:49

solution1
1 2019-03-18 17:03:42

solution2
1 2019-03-18 17:10:29

solution3
1 ACCPTED 2019-03-18 17:11:49