[英]Printing contents of a tsv file (with UTF-8) in Python
The code I have below works fine in a file I've named tsv_test.py: 我下面的代码在名为tsv_test.py的文件中可以正常工作:
import csv
class ReadUTF8():
def unicode_csv_reader(self, utf8_data, dialect=csv.excel_tab, **kwargs):
csv_reader = csv.reader(utf8_data, dialect=dialect, **kwargs)
for row in csv_reader:
yield [unicode(cell, 'utf-8') for cell in row]
def load_deck_data(self):
filename = 'lexicon.tsv'
reader = self.unicode_csv_reader(open(filename))
for field1, field2, field3, field4 in reader:
print field1, field2, field3, field4
ReadUTF8().load_deck_data()
But when I copy/paste it into my project (this is a kivy project), it breaks. 但是,当我将其复制/粘贴到我的项目(这是一个奇异的项目)中时,它就会中断。 Code and error below:
下面的代码和错误:
class StudyScreenManagement(ScreenManager):
def unicode_csv_reader(self, utf8_data, dialect=csv.excel_tab, **kwargs):
csv_reader = csv.reader(utf8_data, dialect=dialect, **kwargs)
for row in csv_reader:
yield [unicode(cell, 'utf-8') for cell in row]
def load_deck_data(self):
filename = 'lexicon.tsv'
reader = self.unicode_csv_reader(open(filename))
for field1, field2, field3, field4 in reader:
print field1, field2, field3, field4
I doubt this is related, but just in case, the related .kv file: 我怀疑这是相关的,但以防万一,相关的.kv文件:
Button:
text: 'Lexicon'
on_press: app.root.load_deck_data()
Output: 输出:
File "/Users/bearnun/code/mingyu/mingyuKivy/mingyu_controllers.py", line 14, in load_deck_data
for field1, field2, field3, field4 in reader:
ValueError: need more than 1 value to unpack
::Side Note:: ::边注::
I tried just printing 'field1' in both cases. 在这两种情况下,我都尝试仅打印“ field1”。 With that change the output for both is:
更改后,两者的输出为:
[u'\u4b03', u'\u98d2', u'[sa4]', u'/variant of \u98af|\u98d2[sa4]/']
[u'\u4b20', u'\u4b20', u'[fei1]', u'/old variant of \u970f[fei1]/']
My desired output: 我想要的输出:
䬃 飒 [sa4] /variant of 颯|飒[sa4]/
䬠 䬠 [fei1] /old variant of 霏[fei1]/
[EDIT BELOW] [下面的编辑]
lexicon.tsv contents: lexicon.tsv内容:
䬃 飒 [sa4] /variant of 颯|飒[sa4]/
䬠 䬠 [fei1] /old variant of 霏[fei1]/
Apparently, I am receiving a list instead of a generator, so if in load_deck_data() I change:
显然,我收到的是列表而不是生成器,因此如果在load_deck_data()中,则更改:
for field1, field2, field3, field4 in reader: print field1, field2, field3, field4
to:
至:
for line in reader: print ''.join(line)
my project works fine.
我的项目运作良好。
Check out this example: 看看这个例子:
data = [
['a', 'b', 'c', 'd'],
['e'],
]
def mygen(x):
for item in x:
yield item
for line in mygen(data):
print ''.join(line)
--output:--
abcd
e
for col1, col2, col3, col4 in mygen(data):
print col1, col2, col3, col4
--output:--
a b c d
Traceback (most recent call last):
File "1.py", line 13, in <module>
for col1, col2, col3, col4 in mygen(data):
ValueError: need more than 1 value to unpack
In the first for-in loop, you are asking, "Please retrieve all the elements in the list and join them together." 在第一个for-in循环中,您询问:“请检索列表中的所有元素并将它们连接在一起。” In the second for-in loop, you are demanding, "Retrieve four elements from the list!"
在第二个forin循环中,您要求“从列表中检索四个元素!” See the difference?
看到不同? In the first case, the list can contain 0 to n elements and there won't be an error.
在第一种情况下,列表可以包含0到n个元素,并且不会出现错误。 In the second case, the list has to have at least 4 elements--otherwise there will be an error.
在第二种情况下,该列表必须至少包含4个元素-否则将出现错误。
I would love to know why I'm getting a generator in one place, but a list in another.
我很想知道为什么要在一个地方放发电机,而在另一个地方放发电机。
Simple. 简单。 You aren't.
你不是
csv.reader()
returns a list of strings for each row, which means your generator function
returns a list of strings for each iteration. csv.reader()
返回每一行的字符串列表,这意味着your generator function
为每次迭代返回一个字符串列表。
I think you changed the data in your file. 我认为您更改了文件中的数据。 In one file, you have
tab delimited
data, and csv.reader()
returns a list of four things for each line in your file, which can be unpacked into four variables; 在一个文件中,您可以使用
tab delimited
数据,而csv.reader()
为文件中的每一行返回一列包含四项内容的列表,可以将其解压缩为四个变量。 but your other file has non-tab delimited
data, which causes csv.reader()
to read the whole line as one item, so the list of strings that csv.reader() returns contains only one item, and a one item list cannot be unpacked into four variables. 但是您的其他文件具有
non-tab delimited
数据,这将导致csv.reader()
将整行作为一项读取,因此csv.reader()返回的字符串列表仅包含一项,而一项列表不能解压成四个变量。
I tried just printing 'field1' in both cases.
在这两种情况下,我都尝试仅打印“ field1”。 With that change the output for both is:
更改后,两者的输出为:
[u'\䬃', u'\飒', u'[sa4]', u'/variant of \颯|\飒[sa4]/'] [u'\䬠', u'\䬠', u'[fei1]', u'/old variant of \霏[fei1]/']
Instead of doing print field1
, if you do print repr(field1)
I suspect you will get: 我不
print repr(field1)
print field1
,而是print repr(field1)
我怀疑您会得到:
"[u'\u4b03', u'\u98d2', u'[sa4]', u'/variant of \u98af|\u98d2[sa4]/']"
Note the outer quotes, which means your tsv file literally has the following on one line: 注意外引号,这意味着您的tsv文件实际上在一行上包含以下内容:
[䬃, 飒, [sa4], /variant of 颯|飒[sa4]/]
with no tabs separating anything, so the whole line-that-looks-like-a-list is read in as one item, therefore csv.reader() returns a list containing that one item. 没有制表符分隔任何内容,因此整条看起来像列表的行作为一项读入,因此csv.reader()返回包含该项的列表。 You were fooled into thinking the single item was a python list because when you print a string, python does not display the quotes.
您被愚蠢地认为单个项目是python列表,因为当您打印字符串时,python不会显示引号。 For example, there is no difference in the output for the following two print statements:
例如,以下两个打印语句的输出没有差异:
>>> print "[1, 2, 3]"
[1, 2, 3]
>>> print [1, 2, 3]
[1, 2, 3]
print
can fool you in other situations as well because a string can contain unprintable characters, which the output of print won't reveal: 在其他情况下,
print
也会使您不知所措,因为字符串可能包含不可打印的字符,而print的输出不会显示这些字符:
>>> print "hello\bworld"
hellworld
The bottom line is: you can never know what the original thing was by looking at the output of print. 最重要的是:通过查看打印输出,您永远无法知道原始内容。 Whenever you want to know exactly what the original thing is, always use:
每当您想确切了解原始内容时,请始终使用:
print repr(some_string)
Now, look at the results: 现在,看一下结果:
>>> print repr([1, 2, 3])
[1, 2, 3]
>>> print repr('[1, 2, 3]')
'[1, 2, 3]'
>>> print repr('hello\bworld')
'hello\x08world'
The output tells you exactly what the original thing was. 输出确切地告诉您原始内容是什么。
With the following tab delimited lexicon.tsv
file: 使用以下制表符分隔的
lexicon.tsv
文件:
1 2 3 €
䬃 飒 [sa4] /variant of 颯|飒[sa4]/
the code below causes no errors after clicking on the Lexicon button: 单击“词典”按钮后,以下代码不会导致任何错误:
from kivy.app import App
from kivy.uix.screenmanager import ScreenManager, Screen
import csv
class StudyScreenManager(ScreenManager):
def unicode_csv_reader(self, utf8_data, dialect=csv.excel_tab, **kwargs):
csv_reader = csv.reader(utf8_data, dialect=dialect, **kwargs)
for row in csv_reader:
yield [unicode(cell, 'utf-8') for cell in row]
def load_deck_data(self):
filename = 'lexicon.tsv'
reader = self.unicode_csv_reader(open(filename))
for field1, field2, field3, field4 in reader:
print field1, field2, field3, field4
class HistoryScreen(Screen):
pass
class MathScreen(Screen):
pass
class MyApp(App):
def build(self):
sm = StudyScreenManager()
sm.add_widget(HistoryScreen(name='history'))
sm.add_widget(MathScreen(name='math'))
return sm
MyApp().run()
my.kv: my.kv:
<HistoryScreen>: #the 'root' of the following widget hierarchy:
BoxLayout:
Button:
text: 'Lexicon'
on_press: app.root.load_deck_data() #self=Button, root=HistoryScreen, app.root=the Widget returned by build()
Button:
text: "Next"
on_press: root.manager.current = "math"
<MathScreen>: #the 'root' of the following widget heirarchy:
BoxLayout:
Button:
text: 'Lexicon'
on_press: app.root.load_deck_data()
Button:
text: 'Previous'
on_press: root.manager.current = "history"
After clicking on the Lexicon button, here is the output I see in my utf-8 aware terminal window
: 单击“词典”按钮后,这是我在
utf-8 aware terminal window
看到的输出:
1 2 3 €
䬃 飒 [sa4] /variant of 颯|飒[sa4]/
Apparently, I am receiving a list instead of a generator, so if in load_deck_data() I change...: 显然,我收到的是列表而不是生成器,所以如果在load_deck_data()中,则更改...:
for field1, field2, field3, field4 in reader:
print field1, field2, field3, field4
...to...: ...至...:
for line in reader:
print ''.join(line)
...my project works fine. ...我的项目效果很好。 This, of course, does not work in the small code snippet that originally worked.
当然,这在最初起作用的小代码段中不起作用。
I would love to know why I'm getting a generator in one place, but a list in another. 我很想知道为什么要在一个地方放发电机,而在另一个地方放发电机。 :)
:)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.