簡體   English   中英

urllib3使用指定的用戶代理下載文件

[英]urllib3 download a file using specified user agent

urllib3更新用戶代理信息的正確方法是什么?

如何檢查用戶代理信息是否確實已更改並正在使用?

例如:

user_agent = {'user-agent': 'Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0'}
http = urllib3.PoolManager(10, headers=user_agent)

r1 = http.request('GET', 'http://example.com/')
if r1.status is 200:
    with open('somefile','w+') as f:
        f.write(r1.data)

當我在http創建一個PoolManager ,我用dir(http)查看它,看到http.headers默認為空並更新為指定的用戶代理信息,但它是否被使用? 無論如何都要檢查而不必查看apache日志?

並在嘗試更新用戶代理后實際檢查/var/log/apache2/access.log

>>> import urllib3
>>> user_agent = {'user-agent': 'Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0'}
>>> http = urllib3.PoolManager(2, headers=user_agent)
>>> r = http.request('GET','localhost')
>>> with open('/var/log/apache2/access.log','r') as f:
...     last_line = f.readlines()[-1]
... 
>>> last_line
'127.0.0.1 - - [08/Dec/2014:20:42:04 -0500] "GET / HTTP/1.1" 200 461 "-" "-"\n'

header參數應該是headers

http = urllib3.PoolManager(10, header=user_agent)

您可以使用httpbin.org網站確認標題已正確設置:

>>> import urllib3
>>> user_agent = {'user-agent': 'Mozilla/5.0 (Windows NT 6.3; rv:36.0) ..'}
>>> http = urllib3.PoolManager(10, headers=user_agent)
>>> r1 = http.urlopen('GET', 'http://httpbin.org/headers')
>>> print(r1.data)
{
  "headers": {
    "Accept-Encoding": "identity",
    "Connect-Time": "1",
    "Connection": "close",
    "Host": "httpbin.org",
    "Total-Route-Time": "0",
    "User-Agent": "Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0",
    "Via": "1.1 vegur",
    "X-Request-Id": "5ef53f21-6caf-4e45-8123-98e417cd05ba"
  }
}

或者您可以使用數據包分析器(例如Wireshark )。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM