简体   繁体   English

使用Python下载文件

[英]Download a file using Python

I have to download a number of files. 我必须下载一些文件。 I tried the following code in python. 我在python中尝试了以下代码。

import urllib2
ul = urllib2.urlopen('http://dds.cr.usgs.gov/emodis/Africa/historical/TERRA/2012/comp_056/AF_eMTH_NDVI.2012.047-056.QKM.COMPRES.005.2012059143841.zip.sum').read()
open("D:/Thesis/test_http_dl", "w").write(ul)

It throws this error: 它抛出此错误:

IOError: [Errno 13] Permission denied: 'D:/Thesis/test_http_dl'

Do you have any idea why is that? 你知道为什么会这样吗? Am I doing something wrong? 难道我做错了什么?
I have tried different folders and it didn't work. 我尝试了不同的文件夹,但没有用。 My folders are not read only. 我的文件夹不是只读的。 the result of print(repr(ul[:60])) is '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">\\n<htm' . print(repr(ul[:60]))'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">\\n<htm'
urllib.urlretrieve() just creates a 1 kb file in the folder, which obviously is not the downloaded file. urllib.urlretrieve()只在文件夹中创建一个1 kb的文件,显然不是下载的文件。

The error tells you exactly what went wrong. 错误告诉您到底出了什么问题。 You don't have permission to write to path D:/Thesis/test_http_dl . 您无权写入路径D:/Thesis/test_http_dl

There are four possible reasons for that: 有四个可能的原因:

  1. You already have a file with that name, which you don't have write access to. 您已经拥有一个具有该名称的文件,您没有该文件的写入权限。
  2. You don't have access to create new files in D:\\Thesis . 您无权在D:\\Thesis创建新文件。
  3. You don't have write access to the D: drive at all (eg, because it's a CD-ROM). 您根本没有对D:驱动器的写访问权(例如,因为它是CD-ROM)。
  4. Some other process has the file open for exclusive access. 其他一些进程打开文件以进行独占访问。

You need to look at the ACLs for D:\\Thesis\\test_http_dl if it exists, or for D:\\Thesis\\ otherwise, and see if your user (the one you're running the script as) has write access, and also check whether that path or the D drive itself has the "read-only" flag on, and also check whether any other process has the file open. 您需要查看D:\\Thesis\\test_http_dl的ACL(如果存在),或者查看D:\\Thesis\\否则,查看您的用户(您正在运行该脚本的用户)是否具有写访问权限,并检查该路径或D驱动器本身是否具有“只读”标志,并检查是否有任何其他进程打开该文件。 (I don't know of any built-in tool for that last one, but handle or Process Explorer from sysinternals can do it for you easily.) (我不知道最后一个的任何内置工具,但来自sysinternals的 handleProcess Explorer可以轻松地为您完成。)

Meanwhile, none of the stuff with urllib2 is at all relevant here. 与此同时, urllib2的所有内容都与此无关。 You can verify that by just doing this: 您只需执行以下操作即可验证:

open("D:/Thesis/test_http_dl", "w")

You will get the exact same exception. 您将获得完全相同的异常。

It's worth knowing how to figure that out the "hard" way, for cases where the exception doesn't tell you exactly what's wrong. 值得知道如何以“硬”的方式解决这个问题,例如,异常并没有告诉你到底出了什么问题。 You get an exception in a line like this: 你在这样一行中得到一个例外:

open("D:/Thesis/test_http_dl", "w").write(ul)

Something is wrong, and if you don't have enough information to tell what it is, what do you do? 有些事情是错的,如果你没有足够的信息来说明它是什么,你会怎么做? Well, first, break it into pieces, so each line has exactly one operation: 好吧,首先,将它分成几部分,所以每行只有一个操作:

f = open("D:/Thesis/test_http_dl", "w")
f.write(ul)

Now you know which one of those two gets an exception. 现在您知道这两个中的哪一个获得异常。

While you're at it, since the only thing this code depends on is ul , you can create a simpler program to test this: 当你在这里时,由于这个代码唯一依赖的是ul ,你可以创建一个更简单的程序来测试它:

ul = 'junk'
f = open("D:/Thesis/test_http_dl", "w")
f.write(ul)

Even if that doesn't help you directly, it means you don't need to wait for the download every time through the test loop, and you've got something simpler to post to SO (see SSCCE for more), and this is something you can just type into the interactive interpreter. 即使这对您没有直接帮助,也意味着您不需要每次都通过测试循环等待下载,并且您可以更简单地发布到SO(更多信息请参见SSCCE ),这是您只需键入交互式解释器即可。 Instead of trying to guess what might be useful to print out to see why the write is raising an exception, you can start with help(f) or dir(f) and play with it live. 而不是试图猜出打印出来以查看write引发异常的原因可能有用,您可以从help(f)dir(f)并实时播放。 (In this case, I'm guessing it's actually the open that fails, not the write , but you shouldn't have to guess.) (在这种情况下,我猜它实际上是open失败,而不是write ,但你不应该猜。)

On to your second problem: 关于你的第二个问题:

urllib.urlretrieve() just creates a 1 kb file in the folder, which obviously is not the downloaded file. urllib.urlretrieve()只在文件夹中创建一个1 kb的文件,显然不是下载的文件。

Actually, I think it is the downloaded file. 实际上,我认为这下载的文件。 You're not asking for AF_eMTH_NDVI.2012.047-056.QKM.COMPRES.005.2012059143841.zip , you're asking for AF_eMTH_NDVI.2012.047-056.QKM.COMPRES.005.2012059143841.zip.sum , which is probably a checksum file—a quasi-standard type of file that contains metadata that helps you make sure the file you're downloading wasn't damaged in transit or tampered with by a hacker. 你不是要求AF_eMTH_NDVI.2012.047-056.QKM.COMPRES.005.2012059143841.zip ,你要求AF_eMTH_NDVI.2012.047-056.QKM.COMPRES.005.2012059143841.zip.sum ,这可能是校验和文件 - 准 - 包含元数据的标准类型文件,可帮助您确保正在下载的文件在传输过程中未被损坏或被黑客篡改。 A typical checksum file has one or more lines, each mapping a downloadable file to a checksum or cryptographic hash digest, in some format, for a downloadable file. 典型的校验和文件具有一行或多行,每行以可下载文件的形式将可下载文件映射到校验和或加密散列摘要(在某种格式中)。 Sometimes they have three columns—the type of checksum/hash, the value of the checksum/hash in some stringified format, and the filename or full URL of the file. 有时它们有三列 - 校验和/散列的类型,一些字符串格式的校验和/散列的值,以及文件的文件名或完整URL。 Sometimes the first column is omitted, and you have to know from elsewhere what type of checksum/hash is being used (often MD5 as a hex string). 有时第一列被省略,你必须从别处知道正在使用什么类型的校验和/哈希(通常是MD5作为十六进制字符串)。 Sometimes the columns are in different orders. 有时列的顺序不同。 Sometimes they're separated by commas or tabs, or in fixed-width fields, or some other variation. 有时它们用逗号或制表符分隔,或者在固定宽度字段或其他变体中分隔。

At any rate, you'd expect a .sum file to be around 80 bytes long. 无论如何,你希望.sum文件长约80个字节。 If you look at it in Explorer or the dir command, it'll usually be rounded up to the nearest 1K. 如果你在资源管理器或dir命令中查看它,它通常会四舍五入到最接近的1K。 So, you should see a 1K file if you download this successfully. 因此,如果您成功下载,则应该看到1K文件。

Meanwhile: 与此同时:

print(repr(ul[:60])) is '\\n print(repr(ul [:60]))是'\\ n

You should try printing out the rest of this, because it's probably some kind of document explaining, in human terms, what you're doing wrong. 你应该尝试打印其余部分,因为它可能是某种文件,从人类的角度来解释你做错了什么。 This could be because you need to pass a URL agent, a preferred encoding, a referer, or some other header. 这可能是因为您需要传递URL代理,首选编码,引用或其他标头。

However, I tested the exact same line of code you used repeatedly, and ul is always: 但是,我测试了你重复使用的完全相同的代码行,而ul总是:

1ba6437044bfa9259fa2d3da8f95aebd  AF_eMTH_NDVI.2012.047-056.QKM.COMPRES.005.2012059143841.zip

In other words, it's a perfectly valid checksum file, not an HTML page. 换句话说,它是一个完全有效的校验和文件,而不是HTML页面。 So, I suspect what's really going on is that you aren't testing the same code you're showing us. 所以,我怀疑你真正发生的是你没有测试你向我们展示的相同代码。

i've tried your code and got same error 我已经尝试过你的代码并得到同样的错误

so try this :D 所以试试这个:D

import urllib
urllib.urlretrieve('http://dds.cr.usgs.gov/emodis/Africa/historical/TERRA/2012/comp_056/AF_eMTH_NDVI.2012.047-056.QKM.COMPRES.005.2012059143841.zip.sum','C:\\path_of_your_folder\\xx.zip.sum')

works fine with me ! 和我一起工作!

import urllib2
def download(url, file):
    dataset = urllib2.urlopen(url)
    CHUNK = 16 * 1024
    with open(file, 'wb') as dl:
        while True:
            peice = dataset.read(CHUNK)
            if not peice: break
            dl.write(peice)

download(r'http://dds.cr.usgs.gov/emodis/Africa/historical/TERRA/2012/comp_056/AF_eMTH_NDVI.2012.047-056.QKM.COMPRES.005.2012059143841.zip',r'AF_eMTH_NDVI.2012.047-056.QKM.COMPRES.005.2012059143841.zip') 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM