简体   繁体   English

用非英文字符在python中解码b64编码的字符串

[英]Decoding a b64 encoded string in python with non english characters

I am using a web server running python (3.6.9) and django.我正在使用运行 python (3.6.9) 和 django 的网络服务器。 On the client I am using javascript to encode some information as a b64 string and send it to the server as a post request.在客户端上,我使用 javascript 将一些信息编码为 b64 字符串,并将其作为发布请求发送到服务器。 Then I decode the b64 string on the server in python.然后我在python中解码服务器上的b64字符串。 However, python raises an error when decoding the string if it contains non english characters.但是,如果字符串包含非英文字符,python 在解码字符串时会引发错误。

I've tried to encode and decode strings in python and javascript, and the b64 encoded string is different in python and javascript when the string contains non english characters.我尝试在 python 和 javascript 中对字符串进行编码和解码,当字符串包含非英文字符时,b64 编码的字符串在 python 和 javascript 中是不同的。 I assume the javascript encoding is correct because its able to decode it again without error, and it's the original string with non english characters.我认为 javascript 编码是正确的,因为它能够再次解码而不会出错,而且它是带有非英文字符的原始字符串。 I need to produce this same behaviour in python so I'm able to correctly decode the b64 string (generated from javascript) on the server.我需要在 python 中产生相同的行为,以便我能够正确解码服务器上的 b64 字符串(从 javascript 生成)。

// javascript

// encode
btoa('abcú') // YWJj+g==

// decode
atob(btoa('abcú')) // abcú

# python
import base64
import json

# encode
a=base64.b64encode('abcú'.encode('utf-8')).decode('utf-8') # YWJjw7o=

# decode
a=base64.b64decode(a).decode('utf-8') # error raised 'UnicodeEncodeError: 'ascii' codec can't encode character '\xfa' in position 3: ordinal not in range(128)'
# print(a) # I want to print the original string 'abcú' here

Both translate the 'abc' to 'YWJj' but python translates 'ú' to 'w7o=' and javscript translates it to '+g=='两者都将“abc”翻译为“YWJj”,但python将“ú”翻译为“w7o=',而javscript将其翻译为“+g==”

How can I make python correctly decode this string with the non english character?我怎样才能让python用非英文字符正确解码这个字符串?

The javascript code is probably * using the latin-1 / ISO-8859-1 encoding: javascript 代码可能是*使用 latin-1 / ISO-8859-1 编码:

>>> s = "YWJj+g=="
>>> import base64
>>> base64.b64decode(s)
b'abc\xfa'
>>> base64.b64decode(s).decode('latin-1')
'abcú'

* There are other 8-bit encodings, such as cp1252, which provide the same result, but latin-1 was the "universal" encoding on the web before the rise of UTF-8. *还有其他 8 位编码,例如 cp1252,它们提供相同的结果,但在 UTF-8 兴起之前,latin-1 是网络上的“通用”编码。 It's worth noting that latin-1 only supports a limited range of non-ASCII characters;值得注意的是,latin-1 只支持有限范围的非 ASCII 字符; the answers to this question provide some information on using UTF-8 and base64 in the browser.这个问题的答案提供了一些有关在浏览器中使用 UTF-8 和 base64 的信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM