简体   繁体   中英

How to read correctly Japanese characters from a file without (escape sequences) “\ufeff” and “\u3000” values in strings?

I have the next Japanese text which I have to separate in strings by their lines ('\\n'). The text is called 'sonnet.txt'

さよなら夜の教室

Once I open the file and split the text to an array of lines.

file = open('sonnet.txt', encoding="utf-8")
jP = file.read().split('\n')

I've got the next result in the python prompt for the list.

>>> jP
['\ufeffさよなら\u3000夜の教室',]

Is there a way possible to get rid of the "\" and "\ " parts, not for this stored value, but in general for other kinds of words? Thank you.

Actually I wrote your code and made sonnet.txt text file, but I didn't get the same result.

My output was: ['さよなら夜の教室']

By the way, I suggest doing like this:

file = open('sonnet.txt', encoding="utf-8")
jP = file.read().replace('\ufeff', '').replace('\u3000', '').split('\n')
print(jP)

More info:

Eliminate the “\ ” error

Unicode Character 'IDEOGRAPHIC SPACE' (U+3000)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM