[英]Ignore missing file while downloading with Python ftplib
I am trying to download a certain file (named 010010-99999-year.gz) from an FTP server. 我正在尝试从FTP服务器下载某个文件(名为010010-99999-year.gz)。 This same file, but for different years is residing in different FTP directories.
同一文件,但是不同年份位于不同的FTP目录中。 For instance:
例如:
ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-lite/2000/010010-99999-1973.gz ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-lite/2001/010010-99999-1974.gz and so on. ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-lite/2000/010010-99999-1973.gz ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd -lite / 2001 / 010010-99999-1974.gz等。 The picture illustrates one of the directories:
该图说明了目录之一:
The file is not located in all the directories (ie all years). 该文件并非位于所有目录(即所有年份)中。 In such case I want the script to ignore that missing files, print "not available", and continue with the next directory (ie next year).
在这种情况下,我希望脚本忽略丢失的文件,打印“不可用”,并继续下一个目录(即明年)。 I could do this using the NLST listing by first generating a list of files in the current FTP directory and then checking if my file is on that list, but that is slow, and NOAA (the organization owning the server) does not like file listing ( source ).
我可以使用NLST列表执行此操作,方法是先在当前FTP目录中生成文件列表,然后检查我的文件是否在该列表中,但这很慢,并且NOAA(拥有服务器的组织)不喜欢文件列表( 来源 )。 Therefore I came up with this code:
因此,我想到了以下代码:
def FtpDownloader2(url="ftp.ncdc.noaa.gov"):
ftp=FTP(url)
ftp.login()
for year in range(1901,2015):
ftp.cwd("/pub/data/noaa/isd-lite")
ftp.cwd(str(year))
fullStationId="010010-99999-%s.gz" % year
try:
file=open(fullStationId,"wb")
ftp.retrbinary('RETR %s' % fullStationId, file.write)
print("File is available")
file.close()
except:
print("File not available")
ftp.close()
This downloads the existing files (year 1973-2014) correctly, but it is also generating empty files for years 1901-1972. 这会正确下载现有文件(1973-2014年),但也会生成1901-1972年的空文件。 The file is not in the FTP for 1901-1972.
该文件不在1901-1972年的FTP中。 Am I doing anything wrong in the use of try and except, or is it some other issue?
使用try和except时我做错什么了吗,还是其他问题?
I took your code and modified it a little: 我拿了您的代码并对其进行了一些修改:
from ftplib import FTP, error_perm
import os
def FtpDownloader2(url="ftp.ncdc.noaa.gov"):
ftp = FTP(url)
ftp.login()
for year in range(1901, 2015):
remote_file = '/pub/data/noaa/isd-lite/{0}/010010-99999-{0}.gz'.format(year)
local_file = os.path.basename(remote_file)
try:
with open(local_file, "wb") as file_handle:
ftp.retrbinary('RETR %s' % remote_file, file_handle.write)
print('OK', local_file)
except error_perm:
print('ERR', local_file)
os.unlink(local_file)
ftp.close()
except
clause without a specific exception class. except
子句。 This type of construct will ignore all errors, making it hard to troubleshoot. error_perm
error_perm
with
statement guarantees that with
语句可以保证 error_perm
exception occurred, a sign that the file is not available from the server error_perm
异常,我删除了本地文件,这表明该文件在服务器上不可用 cwd
twice which slows down the process cwd
,这会减慢该过程 range(1901, 2015)
will not include 2015. If you want it, you have to specify range(1901, 2016)
range(1901, 2015)
将不包括2015。如果需要,您必须指定range(1901, 2016)
This update answers your question regarding not creating empty local file (then having to delete them). 此更新回答您有关不创建空本地文件(然后必须删除它们)的问题。 There are a couple of different ways:
有两种不同的方式:
I think the problem is within your try: except block, where you keep a file handler open for a new file before checking if the file exists or not: 我认为问题出在您的try:except块之内,您可以在检查文件是否存在之前为新文件打开文件处理程序:
try:
file=open(fullStationId,"wb")
ftp.retrbinary('RETR %s' % fullStationId, file.write)
print("File is available")
file.close()
except:
print("File not available")
Instead, add an additional statement in the except block to close the file handler, and another statement to remove the file if it is empty. 而是在except块中添加一条附加语句以关闭文件处理程序,并在该文件为空时添加另一条语句以删除文件。
Another possibility is to open the file for writing locally only if the file exists and has a non zero size on the server using ftp.size
另一种可能性是,仅当文件存在并且使用
ftp.size
在服务器上的大小为非零时,才打开该文件以进行本地写入
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.