在 Python 中的 FTP 服务器上读取前 N 行文件

Question

I have a CSV file on an FTP server.我在 FTP 服务器上有一个 CSV 文件。 The file is around 200mb.该文件约为200mb。

For now, I am reading the file using the following method, the issue with this implementation is that the file takes too long to download, the retrbinary method takes around 12min to execute.目前，我正在使用以下方法读取文件，此实现的问题是文件下载时间过长，执行retrbinary方法大约需要 12 分钟。 I tried with different block sizes, I was able to get the time to 11 min which is still too much.我尝试了不同的块大小，我能够将时间缩短到 11 分钟，这仍然太多了。

download_file = io.BytesIO()
ftp.retrbinary("RETR {}".format(file_path),download_file.write, 8024)
download_file.seek(0)
dataframe = pandas.read_csv(download_file, nrows=4)

I need help reading the file in chunks, I only need the first 4 rows of the file.我需要帮助分块读取文件，我只需要文件的前 4 行。

Answer 1

To read the first 4 lines of a remote file only, use:要仅读取远程文件的前 4 行，请使用：

download_file = io.BytesIO()

ftp.sendcmd('TYPE A')
conn = ftp.transfercmd("RETR {}".format(file_path))
fp = conn.makefile('rb')
count = 0
while count < 4:
    line = fp.readline(ftp.maxline + 1)
    if not line:
        break
    download_file.write(line)
    count += 1
fp.close()
conn.close()

Had you really wanted to process the whole file in chunks , it would be way more complicated, given the API of ftplib and Pandas.如果您真的想分块处理整个文件，那么考虑到 ftplib 和 Pandas 的 API 和 Pandas。 But it is possible.但这是可能的。 For some ideas, see: Get files names inside a zip file on FTP server without downloading whole archive .有关一些想法，请参阅： Get files names inside a zip server on FTP server without download entire archive 。

在 Python 中的 FTP 服务器上读取前 N 行文件

问题描述

1 个解决方案

解决方案1
1 2021-01-05 15:43:04

在 Python 中的 FTP 服务器上读取前 N 行文件

问题描述

1 个解决方案

解决方案1 1 2021-01-05 15:43:04

解决方案1
1 2021-01-05 15:43:04