简体   繁体   中英

urllib vs. urllib.request in Python3 - Enthought Canopy

Getting strange difference inside Enthought Canopy vs. command line when trying to load and utilize urllib and/or urllib.request

Here's what I mean. I'm running Python 3.5 on MacOS 10.11.3. But I've tried this on Windows 10 machine too, and I'm getting the same results. The difference appears to be between using Canopy and using command line.

I'm trying to do basic screen scraping. Based on reading, I think I should be doing:

from urllib.request import urlopen
html = urlopen("http://pythonscraping.com/pages/page1.html")
print(html.read())

This works at a command prompt.

BUT, inside canopy, this does not work. Inside canopy I get the error

ImportError: No module named request 

When Canopy tries to execute the from urllib.request import urlopen

Inside Canopy, THIS is what works:

import urllib
html = urllib.urlopen("http://pythonscraping.com/pages/page1.html")
print(html.read())

I would really like to understand what is happening, as I don't want my Canopy python scripts to fail when I run them outside of Canopy. Also, the Canopy approach does not seem consistent with docs that I've read... I just got there by trial & error.

urllib.request is a module that only exists in Python 3. Enthought Canopy Distribution still ships with a version of Python 2.7 ( 2.7.10 as of the current version 1.6.2).

In Python 2.x, you have the choice of using either urllib or urllib2 , which expose functions like urlopen at the top level (eg urllib.urlopen rather than urllib.request.urlopen ).

If you want your scripts to be able to run through either Python 3.x or in Enthought Canopy's Python distribution, then there are two possible solutions:

  1. Use requests - this is generally the recommended library to use for interacting with HTTP in Python. It's a third-party module which you can install using standard pip or easy_install , or from the Canopy Package Index .

    Your equivalent code would look similar to:

     # This allows you to use the print() function inside Python 2.x from __future__ import print_function import requests response = requests.get("http://pythonscraping.com/pages/page1.html") print(response.text) 
  2. Use conditional importing to bring in the current function you need regardless of version. This is just using built-in features of Python and will not require third-party libraries.

    Your code would then look similar to:

     # This allows you to use the print() function inside Python 2.x from __future__ import print_function import sys try: # Try importing Python 3's urllib.request first. from urllib.request import urlopen except ImportError: # Looks like we're running Python 2.something. from urllib import urlopen response = urlopen("http://pythonscraping.com/pages/page1.html") # urllib.urlopen's response object is different based # on Python version. if sys.version_info[0] < 3: print(response.read()) else: # Python 3's urllib responses return the # stream as a byte-stream, and it's up to you # to properly set the encoding of the stream. This # block just checks if the stream has a content-type set # and if not, it defaults to just using utf-8 encoding = response.headers.get_content_charset() if not encoding: encoding = 'utf-8' print(response.read().decode(encoding)) 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM