简体   繁体   中英

What does urllib.request.urlretrieve do if not returned

From the python document, it is mentioned that urllib.request.urlretrieve returns a tuple and will be used to open file as shown in Code-A below.

However in the example Code-B. The urllib.request.urlretrieve does not return but the code will fail without it. Please help clarify what does urllib.request.urlretrieve doing in Code B. THanks

Code A

import urllib.request
>>> local_filename, headers = urllib.request.urlretrieve('http://python.org/')
>>> html = open(local_filename)
>>> html.close()

Code B

import os
import tarfile
from six.moves import urllib

DOWNLOAD_ROOT = "https://raw.githubusercontent.com/ageron/handson-ml2/master/"
HOUSING_PATH = os.path.join("datasets", "housing") # datasets\housing
HOUSING_URL = DOWNLOAD_ROOT + "datasets/housing/housing.tgz"

def fetch_housing_data(housing_url=HOUSING_URL, housing_path=HOUSING_PATH):
    if not os.path.isdir(housing_path):
            os.makedirs(housing_path)
    tgz_path = os.path.join(housing_path, "housing.tgz") #datasets\housing\housing.tgz
    urllib.request.urlretrieve(housing_url, tgz_path) #what does this code here do?
    housing_tgz = tarfile.open(tgz_path)
    housing_tgz.extractall(path=housing_path)
    housing_tgz.close()

In the second code, by specifying filename , this will automatically save the content locally at the defined path. In this case, this is tgz_path .

I'm not sure what you mean by it failing. A tuple is always returned. The question is whether or not that is stored in memory. For example, the following will still work:

In [1]: import urllib.request                                                                                                                       

In [2]: urllib.request.urlretrieve('http://python.org/', 'test.python')                                                                             
Out[2]: ('test.python', <http.client.HTTPMessage at 0x108d22390>)

The retrieve() method is used to save web content from url (eg csv,images etc) In your case it is saving the housing data saved up in the url. You can check the docs [here][1]

   tgz_path = os.path.join(housing_path, "housing.tgz") #<--- is the path directory

  # takes 2 parameters the url and the file path to save the content 
      urllib.request.urlretrieve( housing_url, tgz_path) 


  [1]: https://docs.python.org/3/library/urllib.request.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM