简体   繁体   English

django / python:python如何编码非英语字符

[英]django/python: How does python encode non-English characters

I am dealing with some string manipulation and try to put them into database. 我正在处理一些字符串操作,并尝试将它们放入数据库中。 Then I encountered this(I believe it's german): 然后我遇到了这个(我相信是德语):

Sichere Administration von VoIP-Endgeräten

After I put it into database, I realized that the non-English characters became: 将其放入数据库后,我意识到非英语字符变为:

Sichere Administration von VoIP-Endger\u00e4ten

and when I fetch it from database and passed this string to subprocess.Popen(), it gives error: 当我从数据库中获取并将该字符串传递给subprocess.Popen()时,它给出了错误:

TypeError: execv() arg 2 must contain only strings

My question is: How did this happen? 我的问题是:这是怎么发生的? Also does anybody have any useful references about how to learn encoding/decoding stuff? 还有没有人有关于如何学习编码/解码东西的有用参考资料? Thanks. 谢谢。

Yes, read the Python Unicode HOWTO ; 是的,请阅读Python Unicode HOWTO you are dealing with encoded and unicode text. 您正在处理编码和unicode文本。

The first string is UTF-8 data being interpreted as Latin-1, the second string is a unicode string and cannot be passed to Popen() without encoding first: 第一个字符串是被解释为Latin-1的UTF-8数据,第二个字符串是unicode字符串,如果不先编码则不能传递给Popen()

>>> print u'\u00e4'  # A unicode escape code for the latin-1 character ä
ä
>>> u'\u00e4'.encode('utf8')  # The same character encoded to UTF-8
'\xc3\xa4'
>>> print u'\u00e4'.encode('utf8').decode('latin1')  # Misinterpreted as Latin-1
ä

You'll need to figure out what encoding your external process can handle and call .encode() on your data before passing it to .Popen() . 您需要先弄清楚外部进程可以处理哪种编码,然后对数据调用.encode() ,然后再将其传递给.Popen()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM