简体   繁体   English

在 Python 中的 FTP 服务器上读取前 N 行文件

[英]Reading first N lines of file on FTP server in Python

I have a CSV file on an FTP server.我在 FTP 服务器上有一个 CSV 文件。 The file is around 200mb.该文件约为200mb。

For now, I am reading the file using the following method, the issue with this implementation is that the file takes too long to download, the retrbinary method takes around 12min to execute.目前,我正在使用以下方法读取文件,此实现的问题是文件下载时间过长,执行retrbinary方法大约需要 12 分钟。 I tried with different block sizes, I was able to get the time to 11 min which is still too much.我尝试了不同的块大小,我能够将时间缩短到 11 分钟,这仍然太多了。

download_file = io.BytesIO()
ftp.retrbinary("RETR {}".format(file_path),download_file.write, 8024)
download_file.seek(0)
dataframe = pandas.read_csv(download_file, nrows=4)

I need help reading the file in chunks, I only need the first 4 rows of the file.我需要帮助分块读取文件,我只需要文件的前 4 行。

To read the first 4 lines of a remote file only, use:要仅读取远程文件的前 4 行,请使用:

download_file = io.BytesIO()

ftp.sendcmd('TYPE A')
conn = ftp.transfercmd("RETR {}".format(file_path))
fp = conn.makefile('rb')
count = 0
while count < 4:
    line = fp.readline(ftp.maxline + 1)
    if not line:
        break
    download_file.write(line)
    count += 1
fp.close()
conn.close()

Had you really wanted to process the whole file in chunks , it would be way more complicated, given the API of ftplib and Pandas.如果您真的想分块处理整个文件,那么考虑到 ftplib 和 Pandas 的 API 和 Pandas。 But it is possible.但这是可能的。 For some ideas, see: Get files names inside a zip file on FTP server without downloading whole archive .有关一些想法,请参阅: Get files names inside a zip server on FTP server without download entire archive

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM