简体   繁体   English

Python gpgme非ascii文本处理

[英]Python gpgme non-ascii text handling

I am trying to encrypt-decrypt a text via GPG using pygpgme, while it works for western characters decryption fails on a Russian text. 我正在尝试使用pygpgme通过GPG加密-解密文本,尽管它适用于西方字符,但俄语文本解密失败。 I use GPG suite on Mac to decrypt e-mail. 我在Mac上使用GPG套件来解密电子邮件。 Here's the code I use to produce encrypted e-mail body, note that I tried to encode message in Unicode but it didn't make any difference. 这是我用来生成加密电子邮件正文的代码,请注意,我尝试使用Unicode对消息进行编码,但没有任何区别。 I use Python 2.7. 我使用Python 2.7。

Please help, I must say I am new to Python. 请帮助,我必须说我是Python的新手。

ctx = gpgme.Context()
ctx.armor = True
key = ctx.get_key('0B26AE38098')

payload = 'Просто тест'

#plain = BytesIO(payload.encode('utf-8'))
plain = BytesIO(payload)
cipher = BytesIO()

ctx.encrypt([key], gpgme.ENCRYPT_ALWAYS_TRUST, plain, cipher)

There are multiple problems here. 这里有多个问题。 You really should read the Unicode HOWTO , but I'll try to explain. 您确实应该阅读Unicode HOWTO ,但是我会尽力解释。


payload = 'Просто тест'

Python 2.x source code is, by default, Latin-1. 默认情况下,Python 2.x源代码为Latin-1。 But your source clearly isn't Latin-1, because Latin-1 doesn't even have those characters. 但是您的来源显然不是Latin-1,因为Latin-1甚至没有这些字符。 What happens if you write Просто тест in one program (like a text editor) as UTF-8, then read it in another program (like Python) as Latin-1? 如果在一个程序(例如文本编辑器)中以utf-8的形式编写Просто тест ,然后在另一个程序(例如Python)中以Latin-1的形式读取它,会发生什么情况? You get ÐÑоÑÑо ÑеÑÑ . 您得到ÐÑоÑÑо ÑеÑÑ So, what you're doing is creating a string full of nonsense. 因此,您要做的是创建一个无意义的字符串。 If you're using ISO-8859-5 rather than UTF-8, it'll be different nonsense, but still nonsense 如果您使用的是ISO-8859-5而不是UTF-8,那将是胡说八道,但仍然是胡说八道

So, first and foremost, you need to find out what encoding you did use in your text editor. 所以,首先,你需要找出编码你在文本编辑器中使用。 It's probably UTF-8, if you're on a Mac, but don't just guess; 如果您使用的是Mac,则可能是 UTF-8,但不要仅仅猜测; find out. 找出。


Second, you have to tell Python what encoding you used. 其次,您必须告诉Python您使用了什么编码。 You do that by using an encoding declaration . 您可以通过使用编码声明来实现 For example, if your text editor uses UTF-8, add this line to the top of your code: 例如,如果您的文本编辑器使用UTF-8,请将此行添加到代码顶部:

# coding=utf-8

One you fix that, payload will be a byte string, encoded in whatever encoding your text editor uses. 您可以解决的一个问题是, payload将是一个字节字符串,以您的文本编辑器使用的任何编码方式进行编码。 But you can't encode already-encoded byte strings, only Unicode strings. 但是您不能编码已经编码的字节字符串,只能编码Unicode字符串。

Python 2.x will let you call encode on them anyway, but it's not very useful—what it will do is first decode the string to Unicode using sys.getdefaultencoding , so it can then encode that. Python 2.x仍然可以让您对其encode ,但这不是很有用-要做的是先使用sys.getdefaultencoding将字符串解码为Unicode,然后再对其进行编码。 That's unlikely to be what you want. 那不可能是您想要的。

The right way to fix this is to make payload a Unicode string in the first place, by using a Unicode literal. 解决此问题的正确方法是首先使用Unicode文字使payload成为Unicode字符串。 Like this: 像这样:

payload = u'Просто тест'

Now, finally, you can actually encode the payload to UTF-8, which you did perfectly correctly in your first attempt: 现在,最后,您实际上可以将有效载荷编码为UTF-8了,您在第一次尝试中就完全正确地做到了:

plain = BytesIO(payload.encode('utf-8'))

Finally, you're encrypting UTF-8 plain text with GPG. 最后,您正在使用GPG加密UTF-8纯文本。 When you decrypt it on the other side, make sure to decode it as UTF-8 there as well, or again you'll probably see nonsense. 在另一端解密时,请确保也将其解码为UTF-8,否则您可能会再次看到废话。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM