简体   繁体   中英

How to read unicode from text file and write respective string into exel file by using Python

I have a file which contains Unicode characters of Japanese language in a file and I would like to read the Unicode from the file and write the respective non-Unicode character (string) into some another file.

The Unicode in the file is like this:

\u6C0F\u540D 
\u7BA1\u7406\u8005\u540D
\u4F4F\u6240
\u96FB\u8A71\u756A\u53F7
\u30E1\u30FC\u30EB\u30A2\u30C9\u30EC\u30B9

Actually, I want to generate an Excel file from this unicode, which consists of the non-Unicode characters of the mentioned unicode.

If you have a file called japanese.txt with the following contents:

\u6C0F\u540D 
\u7BA1\u7406\u8005\u540D
\u4F4F\u6240
\u96FB\u8A71\u756A\u53F7
\u30E1\u30FC\u30EB\u30A2\u30C9\u30EC\u30B9

You could add it to an Excel file with openpyxl , using the following code:

# -*- coding: utf-8 -*-

from openpyxl import Workbook
import codecs

with codecs.open('japanese.txt', 'r', encoding='utf8') as file:
    s = file.read()

s = s.decode('unicode-escape')

wb = Workbook()

ws = wb.active

ws['A1'] = 42

ws.append([1, 2, 3])

import datetime
ws['A2'] = s

wb.save("sample.xlsx")

It appears that there is a package that can work for you called unidecode . It would do this very easily. For instance:

>>> from unidecode import unidecode
>>> print(unidecode(u"\u6C0F\u540D"))
Shi Ming

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM