在python中下载基本的http文件并保存到磁盘？

Question

我是 Python 新手，我一直在浏览本网站上的问答，以回答我的问题。 但是，我是初学者，我发现很难理解某些解决方案。 我需要一个非常基本的解决方案。

有人可以向我解释“通过 http 下载文件”和“在 Windows 中将其保存到磁盘”的简单解决方案吗？

我也不知道如何使用 shutil 和 os 模块。

我要下载的文件小于 500 MB，是一个 .gz 存档文件。如果有人能解释如何提取存档并利用其中的文件，那就太好了！

这是一个部分解决方案，我从各种答案中组合而成：

import requests
import os
import shutil

global dump

def download_file():
    global dump
    url = "http://randomsite.com/file.gz"
    file = requests.get(url, stream=True)
    dump = file.raw

def save_file():
    global dump
    location = os.path.abspath("D:\folder\file.gz")
    with open("file.gz", 'wb') as location:
        shutil.copyfileobj(dump, location)
    del dump

有人可以指出错误（初学者级别）并解释任何更简单的方法来做到这一点？

谢谢！

Answer 1

一种干净的下载文件的方法是：

import urllib

testfile = urllib.URLopener()
testfile.retrieve("http://randomsite.com/file.gz", "file.gz")

这将从网站下载文件并将其命名为file.gz 。 这是我最喜欢的解决方案之一，来自通过 urllib 和 python 下载图片。

此示例使用urllib库，它将直接从源中检索文件。

Answer 2

正如这里提到的：

import urllib
urllib.urlretrieve ("http://randomsite.com/file.gz", "file.gz")

EDIT:如果您仍然想使用请求，请查看此问题或此问题。

Answer 3

对于Python3+ URLopener已弃用。 使用时你会得到如下错误：

url_opener = urllib.URLopener() AttributeError: 模块 'urllib' 没有属性 'URLopener'

所以，试试：

import urllib.request 
urllib.request.urlretrieve(url, filename)

Answer 4

四种方法使用 wget、urllib 和 request。

#!/usr/bin/python
import requests
from StringIO import StringIO
from PIL import Image
import profile as profile
import urllib
import wget


url = 'https://tinypng.com/images/social/website.jpg'

def testRequest():
    image_name = 'test1.jpg'
    r = requests.get(url, stream=True)
    with open(image_name, 'wb') as f:
        for chunk in r.iter_content():
            f.write(chunk)

def testRequest2():
    image_name = 'test2.jpg'
    r = requests.get(url)
    i = Image.open(StringIO(r.content))
    i.save(image_name)

def testUrllib():
    image_name = 'test3.jpg'
    testfile = urllib.URLopener()
    testfile.retrieve(url, image_name)

def testwget():
    image_name = 'test4.jpg'
    wget.download(url, image_name)

if __name__ == '__main__':
    profile.run('testRequest()')
    profile.run('testRequest2()')
    profile.run('testUrllib()')
    profile.run('testwget()')

testRequest - 20.236 秒内 4469882 次函数调用（4469842 次原始调用）

testRequest2 - 0.072 秒内 8580 次函数调用（8574 次原始调用）

testUrllib - 0.036 秒内 3810 次函数调用（3775 次原始调用）

testwget - 0.020 秒内 3489 次函数调用

Answer 5

我使用wget 。

如果你想举例，简单而好的图书馆？

import wget

file_url = 'http://johndoe.com/download.zip'

file_name = wget.download(file_url)

wget 模块支持 python 2 和 python 3 版本

Answer 6

异国情调的 Windows 解决方案

import subprocess

subprocess.run("powershell Invoke-WebRequest {} -OutFile {}".format(your_url, filename), shell=True)

Answer 7

import urllib
urllib.request.urlretrieve("https://raw.githubusercontent.com/dnishimoto/python-deep-learning/master/list%20iterators%20and%20generators.ipynb", "test.ipynb")

将单个原始 juypter 笔记本下载到文件中。

Answer 8

我开始走这条路是因为 ESXi 的 wget 不是使用 SSL 编译的，我想从供应商的网站直接将 OVA 下载到世界另一端的 ESXi 主机上。

我不得不通过编辑规则（正确）禁用防火墙（懒惰）/启用 https

创建了python脚本：

import ssl
import shutil
import tempfile
import urllib.request
context = ssl._create_unverified_context()

dlurl='https://somesite/path/whatever'
with urllib.request.urlopen(durl, context=context) as response:
    with open("file.ova", 'wb') as tmp_file:
        shutil.copyfileobj(response, tmp_file)

ESXi 库有点成对，但开源 weasel 安装程序似乎将 urllib 用于 https ......所以它激励我走这条路

Answer 9

对于文本文件，您可以使用：

import requests

url = 'https://WEBSITE.com'
req = requests.get(url)
path = "C:\\YOUR\\FILE.html"

with open(path, 'wb') as f:
    f.write(req.content)

Answer 10

另一种保存文件的干净方法是：

import csv
import urllib

urllib.retrieve("your url goes here" , "output.csv")

在python中下载基本的http文件并保存到磁盘？

问题描述

10 个解决方案

解决方案1
222 已采纳 2013-10-26 04:59:08

解决方案2
114 2013-10-26 05:00:46

解决方案3
86 2019-05-14 11:10:18

解决方案4
36 2017-07-24 11:21:38

解决方案5
35 2014-09-13 21:13:57

解决方案6
5 2017-11-22 00:50:39

解决方案7
3 2021-01-25 18:31:56

解决方案8
1 2018-06-08 15:03:25

解决方案9
-1 2020-09-21 07:17:15

解决方案10
-6 2014-09-30 16:46:39

在python中下载基本的http文件并保存到磁盘？

问题描述

10 个解决方案

解决方案1 222 已采纳 2013-10-26 04:59:08

解决方案2 114 2013-10-26 05:00:46

解决方案3 86 2019-05-14 11:10:18

解决方案4 36 2017-07-24 11:21:38

解决方案5 35 2014-09-13 21:13:57

解决方案6 5 2017-11-22 00:50:39

解决方案7 3 2021-01-25 18:31:56

解决方案8 1 2018-06-08 15:03:25

解决方案9 -1 2020-09-21 07:17:15

解决方案10 -6 2014-09-30 16:46:39

解决方案1
222 已采纳 2013-10-26 04:59:08

解决方案2
114 2013-10-26 05:00:46

解决方案3
86 2019-05-14 11:10:18

解决方案4
36 2017-07-24 11:21:38

解决方案5
35 2014-09-13 21:13:57

解决方案6
5 2017-11-22 00:50:39

解决方案7
3 2021-01-25 18:31:56

解决方案8
1 2018-06-08 15:03:25

解决方案9
-1 2020-09-21 07:17:15

解决方案10
-6 2014-09-30 16:46:39