[英]django/python: How does python encode non-English characters
I am dealing with some string manipulation and try to put them into database. 我正在处理一些字符串操作,并尝试将它们放入数据库中。 Then I encountered this(I believe it's german):
然后我遇到了这个(我相信是德语):
Sichere Administration von VoIP-Endgeräten
After I put it into database, I realized that the non-English characters became: 将其放入数据库后,我意识到非英语字符变为:
Sichere Administration von VoIP-Endger\u00e4ten
and when I fetch it from database and passed this string to subprocess.Popen(), it gives error: 当我从数据库中获取并将该字符串传递给subprocess.Popen()时,它给出了错误:
TypeError: execv() arg 2 must contain only strings
My question is: How did this happen? 我的问题是:这是怎么发生的? Also does anybody have any useful references about how to learn encoding/decoding stuff?
还有没有人有关于如何学习编码/解码东西的有用参考资料? Thanks.
谢谢。
Yes, read the Python Unicode HOWTO ; 是的,请阅读Python Unicode HOWTO ; you are dealing with encoded and unicode text.
您正在处理编码和unicode文本。
The first string is UTF-8 data being interpreted as Latin-1, the second string is a unicode string and cannot be passed to Popen()
without encoding first: 第一个字符串是被解释为Latin-1的UTF-8数据,第二个字符串是unicode字符串,如果不先编码则不能传递给
Popen()
:
>>> print u'\u00e4' # A unicode escape code for the latin-1 character ä
ä
>>> u'\u00e4'.encode('utf8') # The same character encoded to UTF-8
'\xc3\xa4'
>>> print u'\u00e4'.encode('utf8').decode('latin1') # Misinterpreted as Latin-1
ä
You'll need to figure out what encoding your external process can handle and call .encode()
on your data before passing it to .Popen()
. 您需要先弄清楚外部进程可以处理哪种编码,然后对数据调用
.encode()
,然后再将其传递给.Popen()
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.