I have a file which contains Unicode characters of Japanese language in a file and I would like to read the Unicode from the file and write the respective non-Unicode character (string) into some another file.
The Unicode in the file is like this:
\u6C0F\u540D \u7BA1\u7406\u8005\u540D \u4F4F\u6240 \u96FB\u8A71\u756A\u53F7 \u30E1\u30FC\u30EB\u30A2\u30C9\u30EC\u30B9
Actually, I want to generate an Excel file from this unicode, which consists of the non-Unicode characters of the mentioned unicode.
If you have a file called japanese.txt
with the following contents:
\u6C0F\u540D
\u7BA1\u7406\u8005\u540D
\u4F4F\u6240
\u96FB\u8A71\u756A\u53F7
\u30E1\u30FC\u30EB\u30A2\u30C9\u30EC\u30B9
You could add it to an Excel file with openpyxl
, using the following code:
# -*- coding: utf-8 -*-
from openpyxl import Workbook
import codecs
with codecs.open('japanese.txt', 'r', encoding='utf8') as file:
s = file.read()
s = s.decode('unicode-escape')
wb = Workbook()
ws = wb.active
ws['A1'] = 42
ws.append([1, 2, 3])
import datetime
ws['A2'] = s
wb.save("sample.xlsx")
It appears that there is a package that can work for you called unidecode . It would do this very easily. For instance:
>>> from unidecode import unidecode
>>> print(unidecode(u"\u6C0F\u540D"))
Shi Ming
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.