简体   繁体   中英

Python Paramiko UTF-8 error when trying to stream file from SFTP server

I have a program in which I use Paramiko to get files from SFTP server. Originally I was pulling the file locally with get and then processing through the file by opening the local copy. However, I am trying to avoid the get and just read the file as a stream. This is working fine until I encounter characters that are not UTF-8 - such as <96>. The program gets an exception when this happens. The problem is occurring on the line:

for line in remote_file

So I am not able to get the data from the stream. I have seen mention of decoding and re-encoding but I don't see any way to be able to do this since I am not being given the data by Paramiko.

Is there a Paramiko parameter that says what to do or provides some way to just get the raw data? How do I get around this issue?

Below is the code being processed - the first 3 lines establish the connection. Then I have some code (not shown) where I filter through the directory find a list of files about which I care. The next to last line opens a connection to the file on the SFTP server. The last line is where the error occurs - I have a try block around the whole block of code. When the exception is hit the error that is returned is

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 124: invalid start byte

ftpTransport = paramiko.Transport((FTPSERVER, FTPPORT))
ftpTransport.connect(username=FTPUSERNAME, password=FTPPASSWORD)
sftp = paramiko.SFTPClient.from_transport(ftpTransport)
remote_file = sftp.open(remoteName)
for line in remote_file:

I do not get the UTF-8 error if I do a sftp.get and then open the local file. For now I have changed my code to take that step but would prefer not copying the file locally if I don't have to.

Paramiko assumes that all text files are UTF-8 and uses "strict" decoding (aborting on any error).

To workaround that, you can open the file in "binary" mode. Then, the next() , readline() and similar, will return "binary string", which you can decode using any encoding you like, or decode using UTF-8 ignoring errors:

remote_file = sftp.open(remoteName, "rb")
for line in remote_file:
    print(line.decode("utf8", "ignore"))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM