在Python 3中从utf-16转换为utf-8

Question

我正在用Python 3编程，但遇到一个小问题，我在网上找不到任何引用。

据我了解默认字符串是utf-16，但我必须使用utf-8，我找不到将默认字符串转换为utf-8的命令。 非常感谢您的帮助。

Answer 1

在Python 3中，使用字符串操作时，有两种重要的数据类型很重要。 首先是字符串类，它是表示unicode代码点的对象。 重要的是，该字符串不是字节，而是一个字符序列。 其次，存在字节类，它只是字节序列，通常表示存储在编码中的字符串（如utf-8或iso-8859-15）。

这对您意味着什么？ 据我了解，您想读写utf-8文件。 让我们编写一个程序，用“ç”字符替换所有“ć”

def main():
    # Let's first open an output file. See how we give an encoding to let python know, that when we print something to the file, it should be encoded as utf-8
    with open('output_file', 'w', encoding='utf-8') as out_file:
        # read every line. We give open() the encoding so it will return a Unicode string. 
        for line in open('input_file', encoding='utf-8'):
            #Replace the characters we want. When you define a string in python it also is automatically a unicode string. No worries about encoding there. Because we opened the file with the utf-8 encoding, the print statement will encode the whole string to utf-8.
            print(line.replace('ć', 'ç'), out_file)

那么什么时候应该使用字节？ 不经常。 我能想到的一个例子是当您从套接字读取内容时。 如果在bytes对象中有此对象，则可以通过执行bytes.decode（'encoding'）使其成为unicode字符串，反之亦然，使用str.encode（'encoding'）即可。 但是如前所述，可能您将不需要它。

尽管如此，因为它很有趣，所以这里是一种困难的方式，您可以自己对所有内容进行编码：

def main():
    # Open the file in binary mode. So we are going to write bytes to it instead of strings
    with open('output_file', 'wb') as out_file:
        # read every line. Again, we open it binary, so we get bytes 
        for line_bytes in open('input_file', 'rb'):
            #Convert the bytes to a string
            line_string = bytes.decode('utf-8')
            #Replace the characters we want. 
            line_string = line_string.replace('ć', 'ç')
            #Make a bytes to print
            out_bytes = line_string.encode('utf-8')
            #Print the bytes
            print(out_bytes, out_file)

可以通过http://www.joelonsoftware.com/articles/Unicode.html很好地阅读有关此主题（字符串编码）的信息。 真的推荐看！

来源： http : //docs.python.org/release/3.0.1/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8位

（PS，如您所见，我在这篇文章中没有提到utf-16。我实际上不知道python是否将其用作内部解码，但这是完全不相关的。目前，您正在使用字符串，您使用字符（代码点），而不是字节。

在Python 3中从utf-16转换为utf-8

问题描述

1 个解决方案

解决方案1
6 2010-06-29 11:40:02

在Python 3中从utf-16转换为utf-8

问题描述

1 个解决方案

解决方案1 6 2010-06-29 11:40:02

解决方案1
6 2010-06-29 11:40:02