[英]How to read correctly Japanese characters from a file without (escape sequences) “\ufeff” and “\u3000” values in strings?
I have the next Japanese text which I have to separate in strings by their lines ('\\n').我有下一个日语文本,我必须用它们的行('\\n')在字符串中分隔。 The text is called 'sonnet.txt'
文本称为“sonnet.txt”
さよなら夜の教室
Once I open the file and split the text to an array of lines.一旦我打开文件并将文本拆分为一组行。
file = open('sonnet.txt', encoding="utf-8")
jP = file.read().split('\n')
I've got the next result in the python prompt for the list.我在列表的 python 提示中得到了下一个结果。
>>> jP
['\ufeffさよなら\u3000夜の教室',]
Is there a way possible to get rid of the "\" and "\ " parts, not for this stored value, but in general for other kinds of words?有没有办法摆脱“\”和“\ ”部分,不是为了这个存储的值,而是为了其他类型的词? Thank you.
谢谢你。
Actually I wrote your code and made sonnet.txt
text file, but I didn't get the same result.其实我写了你的代码并制作了
sonnet.txt
文本文件,但我没有得到同样的结果。
My output was: ['さよなら夜の教室']
我的输出是:
['さよなら夜の教室']
By the way, I suggest doing like this:顺便说一句,我建议这样做:
file = open('sonnet.txt', encoding="utf-8")
jP = file.read().replace('\ufeff', '').replace('\u3000', '').split('\n')
print(jP)
More info:更多信息:
Eliminate the “\ ” error 消除“\ ”错误
Unicode Character 'IDEOGRAPHIC SPACE' (U+3000) Unicode 字符 'IDEOGRAPHIC SPACE' (U+3000)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.