简体   繁体   English

列出文件中的 unicode 单词

[英]Make list of unicode words that are in a file

My code is我的代码是

f = codecs.open(r'C:\Users\Admin\Desktop\nepali.txt', 'r', 'UTF-8')
nepali = f.read().split()
for i in nepali:
    print i

Display the words in file:显示文件中的单词:

यो
किताब
टेबुल
मा
छ
यो
एक
किताब
हो
केटा

But when I try to create a list of the words with code:但是当我尝试使用代码创建单词列表时:

file=codecs.open(r'C:\Users\Admin\Desktop\nepali.txt', 'r', 'UTF-8')
nepali = list(file.read().split())
print nepali

The output now is displayed like this现在的输出显示如下

[u'\ufeff\u092f\u094b', u'\u0915\u093f\u0924\u093e\u092c', u'\u091f\u0947\u092c\u0941\u0932', u'\u092e\u093e', u'\u091b', u'\u092f\u094b', u'\u090f\u0915', u'\u0915\u093f\u0924\u093e\u092c', u'\u0939\u094b',]

The output should look like:输出应如下所示:

[यो, किताब, टेबुल, मा, छ,यो, एक, किताब, हो]

You are looking at the output of the repr() function , which is always used for displaying the contents of containers.您正在查看repr()函数的输出,该函数始终用于显示容器的内容。 The output is meant for debugging, not end-user displays;输出用于调试,而不是最终用户显示; any non-printable non-ASCII codepoint is represented by an escape sequence (which can, depending on the codepoint, be a single character escape like \\t or \\n , or use 2, 4, or 8 hex digits, like \\xe5 , \☃ or \\U0001f4e2 ).任何不可打印的非 ASCII 代码点都由转义序列表示(根据代码点,它可以是单个字符转义,如\\t\\n ,或使用 2、4 或 8 个十六进制数字,如\\xe5\☃\\U0001f4e2 )。

You'll have to produce the output manually:您必须手动生成输出:

print u'[{}]'.format(u', '.join(nepali))

This produces a unicode string formatted to look like a list object, but without using repr() , simply by adding square brackets around the strings, joined with ', ' (comma and space).这会生成一个格式化为列表对象的 unicode 字符串,但不使用repr() ,只需在字符串周围添加方括号,并用', ' (逗号和空格)连接。

Demo:演示:

>>> nepali = [u'\ufeff\u092f\u094b', u'\u0915\u093f\u0924\u093e\u092c', u'\u091f\u0947\u092c\u0941\u0932', u'\u092e\u093e', u'\u091b', u'\u092f\u094b', u'\u090f\u0915', u'\u0915\u093f\u0924\u093e\u092c', u'\u0939\u094b',]
>>> print u'[{}]'.format(u', '.join(nepali))
[यो, किताब, टेबुल, मा, छ, यो, एक, किताब, हो]

However, if you want to show this to an end-user, why use the square brackets at all?但是,如果您想向最终用户展示这一点,为什么要使用方括号呢?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM