簡體   English   中英

想要從網站上獲取信息,但使用Python顯示“ 250禁止”

[英]Want to fetch information from a website but it shows “250 forbidden” with Python

我正在使用Python從網站獲取信息。 該腳本非常簡單:

from urllib2 import *

website='http://www.haodf.com'
web=urlopen(website)
content=web.read()#This makes python visit and fetch the content of the website

print content

並返回:

    <?xml version="1.0" encoding="utf-8"?>
        <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
        <html>
        <head>
        <title>250 Forbidden</title>
        </head>
        <body>
        <h1>250 Forbidden</h1>
        </body>
        </html>

為什么內容中有“ 250 Forbidden”? 看來我實際上無法訪問該網站,盡管該腳本在與google.com等其他網站打交道時可以使用。

這個特定的網站要求User-Agent標頭與請求一起發送:

>>> import urllib2
>>> request = urllib2.Request("http://www.haodf.com", headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36'})
>>> print urllib2.urlopen(request).read()
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=gb2312" />
...

或切換到requests (默認情況下發送User-Agent ):

>>> import requests
>>> response = requests.get('http://www.haodf.com')
>>> response.request.headers
CaseInsensitiveDict({'Accept-Encoding': 'gzip, deflate, compress', 'Accept': '*/*', 'User-Agent': 'python-requests/2.2.1 CPython/2.7.5 Darwin/13.3.0'})

>>> print response.text
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=gb2312" />
...

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM