From the python document, it is mentioned that urllib.request.urlretrieve
returns a tuple and will be used to open file as shown in Code-A below.
However in the example Code-B. The urllib.request.urlretrieve
does not return but the code will fail without it. Please help clarify what does urllib.request.urlretrieve
doing in Code B. THanks
Code A
import urllib.request
>>> local_filename, headers = urllib.request.urlretrieve('http://python.org/')
>>> html = open(local_filename)
>>> html.close()
Code B
import os
import tarfile
from six.moves import urllib
DOWNLOAD_ROOT = "https://raw.githubusercontent.com/ageron/handson-ml2/master/"
HOUSING_PATH = os.path.join("datasets", "housing") # datasets\housing
HOUSING_URL = DOWNLOAD_ROOT + "datasets/housing/housing.tgz"
def fetch_housing_data(housing_url=HOUSING_URL, housing_path=HOUSING_PATH):
if not os.path.isdir(housing_path):
os.makedirs(housing_path)
tgz_path = os.path.join(housing_path, "housing.tgz") #datasets\housing\housing.tgz
urllib.request.urlretrieve(housing_url, tgz_path) #what does this code here do?
housing_tgz = tarfile.open(tgz_path)
housing_tgz.extractall(path=housing_path)
housing_tgz.close()
In the second code, by specifying filename
, this will automatically save the content locally at the defined path. In this case, this is tgz_path
.
I'm not sure what you mean by it failing. A tuple is always returned. The question is whether or not that is stored in memory. For example, the following will still work:
In [1]: import urllib.request
In [2]: urllib.request.urlretrieve('http://python.org/', 'test.python')
Out[2]: ('test.python', <http.client.HTTPMessage at 0x108d22390>)
The retrieve() method is used to save web content from url (eg csv,images etc) In your case it is saving the housing data saved up in the url. You can check the docs [here][1]
tgz_path = os.path.join(housing_path, "housing.tgz") #<--- is the path directory
# takes 2 parameters the url and the file path to save the content
urllib.request.urlretrieve( housing_url, tgz_path)
[1]: https://docs.python.org/3/library/urllib.request.html
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.