简体   繁体   中英

Convert string to utf-16

I have a text file with japanese characters. I read a line from it and want to convert it to utf-16 specifically. How can I do it using Python? My code looks like this -

with open("C:\\Users\\badri\\jap.txt", 'rb') as f:
    for line in f:
        u = line.decode(encoding='utf-16',errors='strict')

I get this error "LookupError: unknown encoding: utf-16"

The reason is I want it in utf-16 is because words are separated by spaces and so doesn't matter what language the text file is in. I would be able to use space as a delimiter and count the number of words in the file.

Once separated, I can easily print them this way -

u1 = u'\u0048\u0065\u006c\u006c\u006f'
u2 = u'\u0077\u006f\u0072\u006c\u0064'
u3 = u'\u3053\u3093\u306b\u3061\u306f\u4e16\u754c'
print u1
print u2
print u3

Hello
world
こんにちは世界

This depends entirely on the encoding of the file.

Either way, you need to decode the line first, and then re-encode it so that it's utf-16.

with open(file_path, "r") as fh:
    for line in fh:
        string = line.decode("utf-8").encode("utf-16")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM