简体   繁体   English

将字符串转换为utf-16

[英]Convert string to utf-16

I have a text file with japanese characters. 我有一个带有日语字符的文本文件。 I read a line from it and want to convert it to utf-16 specifically. 我从中读取了一行,并希望将其专门转换为utf-16。 How can I do it using Python? 如何使用Python做到这一点? My code looks like this - 我的代码看起来像这样-

with open("C:\\Users\\badri\\jap.txt", 'rb') as f:
    for line in f:
        u = line.decode(encoding='utf-16',errors='strict')

I get this error "LookupError: unknown encoding: utf-16" 我收到此错误“ LookupError:未知编码:utf-16”

The reason is I want it in utf-16 is because words are separated by spaces and so doesn't matter what language the text file is in. I would be able to use space as a delimiter and count the number of words in the file. 原因是我希望在utf-16中使用它是因为单词之间用空格隔开,所以文本文件所使用的语言无关紧要。我将能够使用空格作为分隔符并计算文件中单词的数量。

Once separated, I can easily print them this way - 分离后,我可以轻松地以这种方式打印它们-

u1 = u'\u0048\u0065\u006c\u006c\u006f'
u2 = u'\u0077\u006f\u0072\u006c\u0064'
u3 = u'\u3053\u3093\u306b\u3061\u306f\u4e16\u754c'
print u1
print u2
print u3

Hello
world
こんにちは世界

This depends entirely on the encoding of the file. 这完全取决于文件的编码。

Either way, you need to decode the line first, and then re-encode it so that it's utf-16. 无论哪种方式,您都需要先解码该行,然后再对其进行编码,以使其为utf-16。

with open(file_path, "r") as fh:
    for line in fh:
        string = line.decode("utf-8").encode("utf-16")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM