[英]How to fix the following issue writing a dictionary into a csv?
Hello I am working with sklearn and using kmeans for natural language processing, I used Kmeans to create clusters from comments, then I create a dictionary with the number of cluster as a Key and the list of comments associated as values as follows: 您好,我正在与sklearn一起使用,并且使用kmeans进行自然语言处理,我使用Kmeans从注释创建聚类,然后创建了一个字典,其中聚类的数目为Key,而注释列表则为与值相关的列表,如下所示:
dict_clusters = {}
for i in range(0,len(kmeans.labels_)):
#print(kmeans.labels_[i])
#print(listComments[i])
if not kmeans.labels_[i] in dict_clusters:
dict_clusters[kmeans.labels_[i]] = []
dict_clusters[kmeans.labels_[i]].append(listComments[i])
print("dictionary constructed")
I would like to write a csv with this dictionary I tried: 我想用我尝试过的这本字典写一个csv:
Out = open("dictionary.csv", "wb")
w = csv.DictWriter(Out,dict_clusters.keys())
w.writerows(dict_clusters)
Out.close()
however I am not sure why is wrong since I am getting the following error, besides I am not sure if this error is related with numpy since kmeans.labels_ contains several values, 但是,由于出现以下错误,因此我不确定为什么出错,此外,由于kmeans.labels_包含多个值,因此我不确定此错误是否与numpy有关,
Traceback (most recent call last):
File "C:/Users/CleanFile.py", line 133, in <module>
w.writerows(dict_clusters)
File "C:\Program Files\Anaconda3\lib\csv.py", line 156, in writerows
return self.writer.writerows(map(self._dict_to_list, rowdicts))
File "C:\Program Files\Anaconda3\lib\csv.py", line 146, in _dict_to_list
wrong_fields = [k for k in rowdict if k not in self.fieldnames]
TypeError: 'numpy.int32' object is not iterable
I would like to appreciate support with this, I wish to get a csv with my dictionary as follows: 我想感谢对此的支持,我希望通过以下字典获得一个csv:
key1, value
key2, value
.
.
.
keyN, value
After feedback from here I tried: 从这里获得反馈后,我尝试:
with open("dictionary.csv", mode="wb") as out_file:
writer = csv.DictWriter(out_file, headers=dict_clusters.keys())
writer.writerow(dict_clusters)
I got: 我有:
Traceback (most recent call last):
File "C:/Users/CleanFile.py", line 129, in <module>
writer = csv.DictWriter(out_file, headers=dict_clusters.keys())
TypeError: __init__() missing 1 required positional argument: 'fieldnames'
attempt2: 尝试2:
Out = open("dictionary.csv", "wb")
w = csv.DictWriter(Out,dict_clusters.keys())
w.writerows([dict_clusters])
Out.close()
Output: 输出:
Traceback (most recent call last):
File "C:/Users/CleanFile.py", line 130, in <module>
w.writerows([dict_clusters])
File "C:\Program Files\Anaconda3\lib\csv.py", line 156, in writerows
return self.writer.writerows(map(self._dict_to_list, rowdicts))
TypeError: a bytes-like object is required, not 'str'
attempt3, this attempt takes a lot of time computing the output: try3,此尝试需要花费大量时间来计算输出:
Out = open("dictionary.csv", "wb")
w = csv.DictWriter(Out,dict_clusters.keys())
w.writerow(dict_clusters)
Out.close()
the version of python that I am using is the following: 我正在使用的python版本如下:
3.5.2 |Anaconda 4.2.0 (64-bit)| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]
3.5.2
After trying a lot of times I decided to use a better way to build my dictionary as follows: 经过很多次尝试后,我决定使用一种更好的方法来构建字典,如下所示:
from collections import defaultdict
pairs = zip(y_pred, listComments)
dict_clusters2 = defaultdict(list)
for num, comment in pairs:
dict_clusters2[num].append(comment)
However it seems that some character is making fail the creation of the csv file as follows: 但是,似乎某些字符使csv文件的创建失败,如下所示:
with open('dict.csv', 'w') as csv_file:
writer = csv.writer(csv_file)
for key, value in dict_clusters2.items():
writer.writerow([key, value])
output: 输出:
Traceback (most recent call last):
File "C:/Users/CleanFile.py", line 146, in <module>
writer.writerow([key, value])
File "C:\Program Files\Anaconda3\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f609' in position 6056: character maps to <undefined>
In order to be more clear I performed: 为了更清楚,我执行了:
for k,v in dict_clusters2.items():
print(k, v)
And I got something like: 我得到了类似的东西:
1 ['hello this is','the car is red',....'performing test']
2 ['we already have','another comment',...'strings strings']
.
.
19 ['we have',' comment music',...'strings strings dance']
My dictionary has a key and a list of several comments I would like to have a csv as follows: 我的字典有一个键和几个注释的列表,我想要一个csv,如下所示:
1,'hello this is','the car is red',....'performing test'
2,'we already have','another comment',...'strings strings'
.
.
19,'we have',' comment music',...'strings strings dance'
however seems that some character is not well maped and everything fails, I would like to receive support thanks for the support. 但是,似乎某些字符的映射不正确,并且一切都失败了,感谢您的支持,我希望得到支持。
The writerows
method must take a list of dictionaries: writerows
方法必须包含字典列表:
Out = open("dictionary.csv", "wb")
w = csv.DictWriter(Out,dict_clusters.keys())
w.writerows([dict_clusters])
Out.close()
You're probably looking for writerow
which takes a single dictionary object: 您可能正在寻找需要单个字典对象的
writerow
:
Out = open("dictionary.csv", "wb")
w = csv.DictWriter(Out,dict_clusters.keys())
w.writerow(dict_clusters)
Out.close()
Aside: you might also want to consider using open
as a context manager (in a with
block) to ensure the file is properly closed: 另外:您可能还想考虑使用
open
作为上下文管理器(在with
块中),以确保文件已正确关闭:
with open("dictionary.csv", mode="wb") as out_file:
writer = csv.DictWriter(out_file, headers=dict_clusters.keys())
writer.writerow(dict_clusters)
Your special character, in a Py3 Ipython session renders as: 在Py3 Ipython会话中,您的特殊字符呈现为:
In [31]: '\U0001f609'
Out[31]: '😉'
Give us a small sample of the dictionary, or better yet the values that you use to build it. 给我们一个字典的小样本,或者更好的是您用来构建它的值。
I haven't worked with csv
much, and csv.DictWriter
even less. 我没有太多使用
csv
,甚至更少使用csv.DictWriter
。 numpy
users often write csv
files with np.savetxt
. numpy
用户经常使用np.savetxt
编写csv
文件。 That's easy to use when writing a purely numeric array. 在编写纯数字数组时,这很容易使用。 If you want to write a mix of character and numeric columns, it is tricker, requiring the use of a structured array.
如果要混合使用字符和数字列,则比较麻烦,需要使用结构化数组。
Another option is to simply write a text file directly. 另一种选择是直接直接编写文本文件。 Just open it, and use
f.write(...)
to write a formatted line to the file. 只需打开它,然后使用
f.write(...)
将格式化的行写入文件。 In fact np.savetxt
does essentially that: 实际上,
np.savetxt
实际上np.savetxt
了:
with open(filename, 'w') as f:
for row in myArray:
f.write(fmt % tuple(row))
savetxt
constructs a fmt
string like %s, %d, %f\\n
. savetxt
构造一个fmt
字符串,如%s, %d, %f\\n
。 It also works with bytestrings, requiring a wb
mode. 它也适用于需要
wb
模式的字节串。 And as such could have even more problems with your special character. 因此,您的特殊角色可能会遇到更多问题。
It might help to focus on printing your dictionary, one key at a time, eg 集中精力打印字典,一次只用一个键可能会有所帮助,例如
for k in mydict.keys():
print(`%s, %s`%(k, mydict[k]))
for a start. 作为一个开始。 Once you get the
print
format right, it is easy to convert that to a file write. 一旦获得正确的
print
格式,就很容易将其转换为文件写入。
=============== ===============
I can write a hypothetical dictionary with your code: 我可以用您的代码编写一个假设的字典:
In [58]: adict={1:'\U0001f609'}
In [59]: with open('test.txt','w') as f:
...: writer=csv.writer(f)
...: for k,v in adict.items():
...: writer.writerow([k,v])
...:
In [60]: cat test.txt
1,😉
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.