I have a list of links that I am trying to get the size of to determine how much computational resources each file need. Is it possible to just get the file size with a get request or something similar?
Here is an example of one of the links: https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/005150/SRR5273887
Thanks
If you're using Python 3, you can do it using urlopen
from urllib.request
:
from urllib.request import urlopen
link = "https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/005150/SRR5273887"
site = urlopen(link)
meta = site.info()
print(meta)
This will output:
Server: nginx
Date: Mon, 18 Mar 2019 17:02:40 GMT
Content-Type: application/octet-stream
Content-Length: 578220087
Last-Modified: Tue, 21 Feb 2017 12:13:19 GMT
Connection: close
Accept-Ranges: bytes
The Content-Length
property is the size of your file in bytes.
You need to use the HEAD
method. The example uses requests ( pip install requests
).
#!/usr/bin/env python
# display size of remote file without downloading
import sys
import requests
# pass URL as first argument
response = requests.head(sys.argv[1], allow_redirects=True)
size = response.headers.get('content-length', -1)
# print size in megabytes
print('\t{:<40}: {:.2f} MB'.format('FILE SIZE', int(size) / float(1 << 20)))
Also see How do you send a HEAD HTTP request in Python 2? if you want standard-library based solutions.
To do this use the HTTP HEAD method which just grabs the header information for the URL and doesn't download the content like an HTTP GET request does.
$curl -I https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/005150/SRR5273887
HTTP/1.1 200 OK
Server: nginx
Date: Mon, 18 Mar 2019 16:56:35 GMT
Content-Type: application/octet-stream
Content-Length: 578220087
Last-Modified: Tue, 21 Feb 2017 12:13:19 GMT
Connection: keep-alive
Accept-Ranges: bytes
The file size is in the 'Content-Length' header. In Python 3.6:
>>> import urllib
>>> req = urllib.request.Request('https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/005150/SRR5273887',
method='HEAD')
>>> f = urllib.request.urlopen(req)
>>> f.status
200
>>> f.headers['Content-Length']
'578220087'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.