简体   繁体   English

尝试使用ftplib和storbinary ==> UnicodeEncodeError存储png图像

[英]Trying to store a png image using ftplib and storbinary ==> UnicodeEncodeError

I'm new to StackOverflow so I'm really excited! 我是StackOverflow的新手,所以我真的很兴奋! 😊 😊

I'm having the following issue with my code: I am trying to store a 'png' image into an FTP server (a screenshot of a website). 我的代码存在以下问题:我试图将“ png”图像存储到FTP服务器(网站的屏幕截图)中。

I'm using ftplib and selenium (with webdriver): 我正在使用ftplib和selenium(与webdriver一起使用):

driver.get(<someURL>)
screenshot = driver.save_screenshot(driver.title + '.png')
ftp.storbinary("STOR <PathToServer>" + driver.title + '.png', open(driver.title + '.png', 'rb'))

This method works when the website is written with Latin characters, the problem is that the image can be the screenshot of a website located in Thailand, China, or Egypt, for instance. 当网站使用拉丁字符书写时,此方法有效,问题是图像可能是例如位于泰国,中国或埃及的网站的屏幕截图。 In that case, the line with: 在这种情况下,符合以下条件的行:

open(driver.title + '.png', 'rb')

returns the infamous error: 返回臭名昭著的错误:

UnicodeEncodeError: 'latin-1' codec can't encode character '\u1ec7' in position 60: ordinal not in range(256)

I understand that storbinary is accepting only binary numbers (as the name of that method implies). 我知道storbinary仅接受二进制数(正如该方法的名称所暗示的那样)。 However, what I don't understand is how I can "encode" the png so that it will not lead to that error, and so that it can be stored successfully into the FTP server. 但是,我不了解的是如何对png进行“编码”,从而不会导致该错误,并可以将其成功存储到FTP服务器中。

Thank you so very much! 非常感谢你! Any help, or comment, or insight, would be deeply appreciated. 任何帮助,评论或见解将不胜感激。 Best! 最好!

Fact is text as represented in a running computer program and as stored in files, filenames and databases are two different things. 事实是在运行中的计算机程序中表示的文本以及在文件,文件名和数据库中存储的文本是两种不同的东西。 One can think of the former as a set of characters, without minding how they are represented internally. 可以将前者视为一组字符,而不必考虑它们在内部的表示方式。 On the other hand, to store this text in a filesystem, DB, transmit it over a network, the text have to be represented as bytes. 另一方面,要将文本存储在文件系统DB中,并通过网络传输,则必须将文本表示为字节。 This process of transforming the "pure" text you have when the program is running into a byte representation is called "encoding". 当程序运行时,将您拥有的“纯”文本转换为字节表示的过程称为“编码”。 For better understanding that, I suggest reading this article . 为了更好地理解这一点,建议阅读本文

Python 3, both the core language and the libraries, try to automatically select the proper text encoding when doing any I/O with text. 当对文本进行任何I / O时,Python 3(包括核心语言和库)都尝试自动选择正确的文本编码。 In your case, it picked the "latin1" codec for filenames in the target server. 在您的情况下,它为目标服务器中的文件名选择了“ latin1”编解码器。

Latin1 is limited to a little over 200 valid characters and can't represent a lot of characters or glyphs - any non-western language character,and even some western ones, such as Ĺ, Ṕ, ŵ, can't be represented with it. Latin1仅限于200个以上的有效字符,并且不能表示很多字符或字形-任何非西方语言的字符,甚至某些西方字符(例如Ĺ,Ṕ,can)都不能用它表示。

The suggestion is to perform a manual encoding of the name before leting Python doing so, because then we can have control on how to handle non-existing characters in the target encoding. 建议在对Python进行名称编码之前先对其进行手动编码,因为这样我们就可以控制如何处理目标编码中不存在的字符。 Since the library method ( .strobinary ) seems to be expecting the filename as a string, then, we "decode" the name back, but keeping the replacements for invalid characters we got when first encoding, and pass the result of this roundtrip to the library. 由于库方法( .strobinary )似乎期望文件名是字符串,因此,我们将“名称”“解码”回去,但是保留第一次编码时得到的无效字符的替换,并将此往返的结果传递给图书馆。

So, to keep the information about your characters that does not exist in latin1, I'd suggest using an escape-encoding - other options would be to replace then with a "?" 因此,为了保留latin1中不存在的有关您的字符的信息,我建议使用转义编码-其他选项是用“?”代替 or ignore, just supressing all characters: 或忽略,仅抑制所有字符:

filename = driver.title.encode("latin1", errors="xmlcharrefreplace").decode("latin1") + ".png"
screenshot = driver.save_screenshot(filename)
ftp.storbinary("STOR <PathToServer>" + filename, open(filename, 'rb'))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM