Unable to download url link with requests in Python

Question

The objective is to download a tar.gz from a cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K-Word_V3.0.tar.gz

The file can be downloaded without any issue with wget .

!wget cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K-Word_V3.0.tar.gz --no-check-certificate

However, the download the file using requests

import requests
url='cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K-Word_V3.0.tar.gz'
r = requests.get(url)

Return an error

MissingSchema                             Traceback (most recent call last)

<ipython-input-11-fa35f2c0ddc0> in <module>()
      1 url='cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K-Word_V3.0.tar.gz'
----> 2 r = requests.get(url)

5 frames

/usr/local/lib/python3.7/dist-packages/requests/models.py in prepare_url(self, url, params)
    386             error = error.format(to_native_string(url, 'utf8'))
    387 
--> 388             raise MissingSchema(error)
    389 
    390         if not host:

MissingSchema: Invalid URL 'cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K-Word_V3.0.tar.gz': No schema supplied. Perhaps you meant http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K-Word_V3.0.tar.gz?

May I know what is the issue?

Answer 1

您的url变量中缺少 http:// 或 https:// （架构，如错误消息所述）。

url = 'https://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K-Word_V3.0.tar.gz'

Answer 2

You miss the http header

import requests
requests.get("http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K-Word_V3.0.tar.gz")

This should also work

import wget
wget.download("http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K-Word_V3.0.tar.gz", out="YOUR_PATH")

Unable to download url link with requests in Python

Question

2 answers

solution1
0 ACCPTED 2022-06-10 06:49:15

solution2
-1 2022-06-10 06:38:05

Unable to download url link with requests in Python

Question

2 answers

solution1 0 ACCPTED 2022-06-10 06:49:15

solution2 -1 2022-06-10 06:38:05

solution1
0 ACCPTED 2022-06-10 06:49:15

solution2
-1 2022-06-10 06:38:05