I have a text file with japanese characters. I read a line from it and want to convert it to utf-16 specifically. How can I do it using Python? My code looks like this -
with open("C:\\Users\\badri\\jap.txt", 'rb') as f:
for line in f:
u = line.decode(encoding='utf-16',errors='strict')
I get this error "LookupError: unknown encoding: utf-16"
The reason is I want it in utf-16 is because words are separated by spaces and so doesn't matter what language the text file is in. I would be able to use space as a delimiter and count the number of words in the file.
Once separated, I can easily print them this way -
u1 = u'\u0048\u0065\u006c\u006c\u006f'
u2 = u'\u0077\u006f\u0072\u006c\u0064'
u3 = u'\u3053\u3093\u306b\u3061\u306f\u4e16\u754c'
print u1
print u2
print u3
Hello
world
こんにちは世界
This depends entirely on the encoding of the file.
Either way, you need to decode the line first, and then re-encode it so that it's utf-16.
with open(file_path, "r") as fh:
for line in fh:
string = line.decode("utf-8").encode("utf-16")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.