python3中的urllib與urllib.request-有思想的冠層

Question

嘗試加載和利用urllib和/或urllib.request時，在Enthought Canopy與命令行內部產生了奇怪的區別

這就是我的意思。 我在MacOS 10.11.3上運行Python 3.5。 但是我也在Windows 10機器上嘗試過，並且得到了相同的結果。 區別似乎在於使用Canopy和使用命令行之間。

我正在嘗試進行基本的屏幕抓取。 根據閱讀，我認為我應該這樣做：

from urllib.request import urlopen
html = urlopen("http://pythonscraping.com/pages/page1.html")
print(html.read())

這在命令提示符下工作。

但是，在樹冠內部，這是行不通的。 機蓋內部出現錯誤

ImportError: No module named request

當Canopy嘗試從urllib.request導入urlopen執行時

在樹冠內部，這是可行的：

import urllib
html = urllib.urlopen("http://pythonscraping.com/pages/page1.html")
print(html.read())

我真的很想了解發生了什么，因為當我在Canopy之外運行它們時，我不希望Canopy python腳本失敗。 另外，Canopy方法似乎與我閱讀的文檔不一致……我只是通過反復試驗才達到目的。

Answer 1

urllib.request是僅在Python 3中存在的模塊。EnthoughtCanopy Distribution仍隨附Python 2.7版本（當前版本1.6.2為2.7.10 ）。

在Python 2.x中，您可以選擇使用urllib或urllib2 ，它們在頂層公開諸如urlopen之類的功能（例如urllib.urlopen而不是urllib.request.urlopen ）。

如果您希望腳本能夠通過Python 3.x或Enthought Canopy的Python發行版運行，則有兩種可能的解決方案：

使用requests -通常推薦使用此庫與Python中的HTTP交互。 這是一個第三方模塊，您可以使用標准pip或easy_install或從Canopy軟件包索引中進行安裝。

您的等效代碼類似於：

 # This allows you to use the print() function inside Python 2.x from __future__ import print_function import requests response = requests.get("http://pythonscraping.com/pages/page1.html") print(response.text)

使用條件導入可以引入所需的當前功能，而不管版本如何。 這只是使用Python的內置功能，不需要第三方庫。

您的代碼將類似於：

 # This allows you to use the print() function inside Python 2.x from __future__ import print_function import sys try: # Try importing Python 3's urllib.request first. from urllib.request import urlopen except ImportError: # Looks like we're running Python 2.something. from urllib import urlopen response = urlopen("http://pythonscraping.com/pages/page1.html") # urllib.urlopen's response object is different based # on Python version. if sys.version_info[0] < 3: print(response.read()) else: # Python 3's urllib responses return the # stream as a byte-stream, and it's up to you # to properly set the encoding of the stream. This # block just checks if the stream has a content-type set # and if not, it defaults to just using utf-8 encoding = response.headers.get_content_charset() if not encoding: encoding = 'utf-8' print(response.read().decode(encoding))

python3中的urllib與urllib.request-有思想的冠層

問題描述

1 個解決方案

解決方案1
2 2016-01-24 16:10:55

python3中的urllib與urllib.request-有思想的冠層

問題描述

1 個解決方案

解決方案1 2 2016-01-24 16:10:55

解決方案1
2 2016-01-24 16:10:55