简体   繁体   English

Python中的UTF-8字符串解码

[英]UTF-8 String decoding in Python

In a project I need a PHP and a Python module (Python 3.5.2). 在一个项目中,我需要一个PHP和一个Python模块(Python 3.5.2)。 As well as a configfile which both modules use. 以及两个模块都使用的配置文件。 The Python configparser has problems reading special characters from the configfile, like german mutated vowel (ä,ö,ü, eg). Python configparser在从配置文件中读取特殊字符时遇到问题,例如德语变异的元音(例如,ä,ö,ü)。 From the PHP side I use the utf-8 encoding to bypass the problem: 从PHP方面,我使用utf-8编码来绕过该问题:

    utf8_encode ("Köln") //result: Köln

From the Python side I tried the decode function: 从Python方面,我尝试了解码功能:

    "Köln".decode("utf-8", "strict")

I expected the result "Köln" but just got the result "Köln" again. 我期望结果为“Köln”,但再次得到结果为“Köln”。 What do I have do to do to decode my String? 我该怎么做才能解码我的字符串?

Try these lines added on the top of your document: 尝试在文档顶部添加以下行:

# -*- coding: latin-1 -*-
# Encoding schema https://www.python.org/dev/peps/pep-0263

This may help you, more documentation here 这可能对您有帮助,请在此处获取更多文档

在这种情况下,您应该在.py文件的第一行中添加#-*- coding: UTF-8 -*-

In Python3, all text is in unicode. 在Python3中,所有文本均为unicode。 So I would recommend, on your PHP side, converting the string to unicode (outputting as u'K\\xf6ln'). 因此,我建议在您的PHP方面,将字符串转换为unicode(输出为u'K \\ xf6ln')。 After you've done this, you can convert it back to (sort of) it's original form in Python, however, the mutated vowel will be destroyed. 完成此操作后,您可以将其转换回Python中的原始形式(某种程度上),但是,变异的元音将被销毁。

import unicodedata

unicodetext = u'K\xf6ln'
output = unicodedata.normalize('NFKD', unicodetext).encode('ascii', 'ignore')

This will output a lonesome Koln, without the rather pretty mutation. 这将输出一个寂寞的科隆,而不会产生相当大的变异。 From my research, I can't find any way around this, but please, anybody who finds a more apt solution please comment 从我的研究中,我找不到任何解决方法,但是请任何发现更合适解决方案的人请发表评论

Thanks for all the helpfull answers and comments. 感谢您提供所有有用的答案和评论。 I finally ended up with the following solution: 我最终得到以下解决方案:

On the PHP side I encode my string with the following command: 在PHP方面,我使用以下命令对字符串进行编码:

$str = "path/to/file/Köln.jpg";
json_encode ($str, JSON_UNESCAPED_SLASHES);

The result is the string "path/to/file/K\öln.jpg" which is then stored in my config file. 结果是字符串“ path / to / file / K \\ u00f6ln.jpg”,然后将其存储在我的配置文件中。 The Python module uses ConfigParser to read the file. Python模块使用ConfigParser读取文件。 The encoded string is then decoded with the following command: 然后使用以下命令对编码后的字符串进行解码:

str.encode('utf8').decode('utf8')

The result is again "path/to/file/Köln.jpg". 结果还是“ path / to / file /Köln.jpg”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM