简体   繁体   English

写入使用 pysftp “open”方法打开的 SFTP 服务器上的文件很慢

[英]Writing to a file on SFTP server opened using pysftp “open” method is slow

I have a piece of Python code that works, but is very slow to write a Dataframe directly to an SFTP location.我有一段 Python 代码可以工作,但是将 Dataframe 直接写入 SFTP 位置非常慢。 I am using pysftp and pandas.to_csv() to achieve the task of reading an Excel file from a remote location, run a few simple transformations and write it over to an SFTP location.我正在使用pysftppandas.to_csv()来完成从远程位置读取 Excel 文件的任务,运行一些简单的转换并将其写入 SFTP 位置。

The code snippet is shared below which, takes 4 minutes 30 seconds precisely, to write 100 records to the SFTP location.下面共享代码片段,精确地需要 4 分 30 秒,将 100 条记录写入 SFTP 位置。 An average Dataframe that I process has a maximum of 20 columns.我处理的平均 Dataframe 最多有 20 列。

def dataframe_sftp_transfer(df,destination_path):
    cnopts = CnOpts()
    cnopts.hostkeys = None
    sftp = Connection('sftp3.server.com'
                    ,username= 'user'
                    ,password = 'pwd123'
                    ,cnopts=cnopts)
    with sftp.open(destination_path,'w+') as f:
        chunksize = 100
        with tqdm(total=len(df)) as progbar:
            df.to_csv(f,sep='~',index=False,chunksize=chunksize)
            progbar.update(chunksize)

Is there a better/faster way to achieve the aforesaid?有没有更好/更快的方法来实现上述目标? Shouldn't writing files of the stated magnitude take only a couple of minutes?编写规定大小的文件不应该只需要几分钟吗?

Using a tool like FileZilla to put files in the remote SFTP location works much faster but, that sadly takes away any form of automation.使用 FileZilla 之类的工具将文件放入远程 SFTP 位置的速度要快得多,但遗憾的是,这会带走任何形式的自动化。

You open the remote file without buffering.您打开远程文件而不进行缓冲。 That way, every time the df.to_csv writes to the file, Paramiko/pysftp sends a request to the SFTP server and waits for a response.这样,每次df.to_csv写入文件时,Paramiko/pysftp 都会向 SFTP 服务器发送请求并等待响应。 I do not know internals of df.to_csv , but it's likely it does one write per line (if not more).我不知道df.to_csv的内部结构,但它很可能每行写一次(如果不是更多的话)。 That would explain, why the upload is so slow.这可以解释为什么上传这么慢。 Particularly, if your connection to the server has high latency.特别是,如果您与服务器的连接具有高延迟。

To enable buffered writes, use bufsize parameter of Connection.open :要启用缓冲写入,请使用Connection.openbufsize参数:

with sftp.open(destination_path, 'w+', 32768) as f:

Similarly for reads/downloads:同样对于读取/下载:
Reading file opened with Python Paramiko SFTPClient.open method is slow读取用 Python Paramiko SFTPClient.open 方法打开的文件很慢


Obligatory warning: Do not set cnopts.hostkeys = None , unless you do not care about security.强制性警告:不要设置cnopts.hostkeys = None ,除非您不关心安全性。 For the correct solution see Verify host key with pysftp .有关正确的解决方案,请参阅使用 pysftp 验证主机密钥

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM