使用Google Chrome中的工作鏈接解決HTTP錯誤400：錯誤請求

Question

我知道已經有很多形式的問題，但我似乎找不到答案，希望能在這里得到一些幫助。 我嘗試下載存儲在URL列表后面的文件。

我發現以下功能可以做我想要的：

import os.path
import urllib.request
import requests

for link in links:
    link = link.strip()
    name = link.rsplit('/', 1)[-1]
    filename = os.path.join('downloads', name)

    if not os.path.isfile(filename):
        print('Downloading: ' + filename)
        try:
            urllib.request.urlretrieve(link, filename)
        except Exception as inst:
            print(inst)
            print('  Encountered unknown error. Continuing.')

我總是收到：HTTP錯誤400：錯誤的請求。

我試圖設置用戶代理假冒瀏覽器訪問（我使用谷歌瀏覽器），但它根本沒有幫助。 如果在瀏覽器中復制鏈接，那么我想知道如何解決這個問題。

Answer 1

必須引用空格。 我使用引用函數來引用鏈接中的文件名。 我也使用rindex來刪除url路徑中的最后一部分。 有urlsplit和urlunsplit函數應該用來代替字符串操作，但是..我太懶了：D

import os.path
import urllib.request
from urllib.parse import quote

links = ['https://undpgefpims.org/attachments/6222/216410/1717887/1724973/6222_4NC_3BUR_Macedonia_Final ProDoc 30 July 2018.doc', 'https://undpgefpims.org/attachments/6214/216405/1719672/1729436/6214_4NC_Niger_ProDoc  final for DoA.doc']

for link in links:
    link = link.strip()
    name = link.rsplit('/', 1)[-1]
    filename = os.path.join('downloads', name)

    if not os.path.isfile(filename):
        print('Downloading: ' + filename)
        try:
            urllib.request.urlretrieve(link[:link.rindex('/') + 1] + quote(link[link.rindex('/') + 1:]), filename)
        except Exception as inst:
            print(inst)
            print('  Encountered unknown error. Continuing.')

Answer 2

我找到了自己問題的答案。

問題是網址包含空格，顯然無法通過urllib.request正確讀取。 解決方案是首先將urls解析為引號，然后調用引用的url。

這是遇到同樣問題的所有人的工作代碼：

import os.path
import urllib.request
import requests
import urllib.parse

for link in urls:
    link = link.strip()
    name = link.rsplit('/', 1)[-1]
    filename = os.path.join(name)
    quoted_url = urllib.parse.quote(link, safe=":/")

    if not os.path.isfile(filename):
        print('Downloading: ' + filename)
        try:
            urllib.request.urlretrieve(quoted_url, filename)
        except Exception as inst:
            print(inst)
            print('  Encountered unknown error. Continuing.')

使用Google Chrome中的工作鏈接解決HTTP錯誤400：錯誤請求

問題描述

2 個解決方案

解決方案1
0 2019-05-09 09:59:23

解決方案2
0 已采納 2019-05-09 10:00:27

使用Google Chrome中的工作鏈接解決HTTP錯誤400：錯誤請求

問題描述

2 個解決方案

解決方案1 0 2019-05-09 09:59:23

解決方案2 0 已采納 2019-05-09 10:00:27

解決方案1
0 2019-05-09 09:59:23

解決方案2
0 已采納 2019-05-09 10:00:27