[英]Reading first N lines of file on FTP server in Python
I have a CSV file on an FTP server.我在 FTP 服务器上有一个 CSV 文件。 The file is around 200mb.
该文件约为200mb。
For now, I am reading the file using the following method, the issue with this implementation is that the file takes too long to download, the retrbinary
method takes around 12min to execute.目前,我正在使用以下方法读取文件,此实现的问题是文件下载时间过长,执行
retrbinary
方法大约需要 12 分钟。 I tried with different block sizes, I was able to get the time to 11 min which is still too much.我尝试了不同的块大小,我能够将时间缩短到 11 分钟,这仍然太多了。
download_file = io.BytesIO()
ftp.retrbinary("RETR {}".format(file_path),download_file.write, 8024)
download_file.seek(0)
dataframe = pandas.read_csv(download_file, nrows=4)
I need help reading the file in chunks, I only need the first 4 rows of the file.我需要帮助分块读取文件,我只需要文件的前 4 行。
To read the first 4 lines of a remote file only, use:要仅读取远程文件的前 4 行,请使用:
download_file = io.BytesIO()
ftp.sendcmd('TYPE A')
conn = ftp.transfercmd("RETR {}".format(file_path))
fp = conn.makefile('rb')
count = 0
while count < 4:
line = fp.readline(ftp.maxline + 1)
if not line:
break
download_file.write(line)
count += 1
fp.close()
conn.close()
Had you really wanted to process the whole file in chunks , it would be way more complicated, given the API of ftplib and Pandas.如果您真的想分块处理整个文件,那么考虑到 ftplib 和 Pandas 的 API 和 Pandas。 But it is possible.
但这是可能的。 For some ideas, see: Get files names inside a zip file on FTP server without downloading whole archive .
有关一些想法,请参阅: Get files names inside a zip server on FTP server without download entire archive 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.