简体   繁体   中英

How to fix the following issue writing a dictionary into a csv?

Hello I am working with sklearn and using kmeans for natural language processing, I used Kmeans to create clusters from comments, then I create a dictionary with the number of cluster as a Key and the list of comments associated as values as follows:

dict_clusters = {}
for i in range(0,len(kmeans.labels_)):
    #print(kmeans.labels_[i])
    #print(listComments[i])
    if not kmeans.labels_[i] in dict_clusters:
        dict_clusters[kmeans.labels_[i]] = []
    dict_clusters[kmeans.labels_[i]].append(listComments[i])
print("dictionary constructed")

I would like to write a csv with this dictionary I tried:

Out = open("dictionary.csv", "wb")
w = csv.DictWriter(Out,dict_clusters.keys())
w.writerows(dict_clusters)
Out.close()

however I am not sure why is wrong since I am getting the following error, besides I am not sure if this error is related with numpy since kmeans.labels_ contains several values,

Traceback (most recent call last):
  File "C:/Users/CleanFile.py", line 133, in <module>
    w.writerows(dict_clusters)
  File "C:\Program Files\Anaconda3\lib\csv.py", line 156, in writerows
    return self.writer.writerows(map(self._dict_to_list, rowdicts))
  File "C:\Program Files\Anaconda3\lib\csv.py", line 146, in _dict_to_list
    wrong_fields = [k for k in rowdict if k not in self.fieldnames]
TypeError: 'numpy.int32' object is not iterable

I would like to appreciate support with this, I wish to get a csv with my dictionary as follows:

key1, value
key2, value
.
.
.
keyN, value

After feedback from here I tried:

with open("dictionary.csv", mode="wb") as out_file:
    writer = csv.DictWriter(out_file, headers=dict_clusters.keys())
    writer.writerow(dict_clusters)

I got:

Traceback (most recent call last):
  File "C:/Users/CleanFile.py", line 129, in <module>
    writer = csv.DictWriter(out_file, headers=dict_clusters.keys())
TypeError: __init__() missing 1 required positional argument: 'fieldnames'

attempt2:

Out = open("dictionary.csv", "wb")
w = csv.DictWriter(Out,dict_clusters.keys())
w.writerows([dict_clusters])
Out.close()

Output:

Traceback (most recent call last):
  File "C:/Users/CleanFile.py", line 130, in <module>
    w.writerows([dict_clusters])
  File "C:\Program Files\Anaconda3\lib\csv.py", line 156, in writerows
    return self.writer.writerows(map(self._dict_to_list, rowdicts))
TypeError: a bytes-like object is required, not 'str'

attempt3, this attempt takes a lot of time computing the output:

Out = open("dictionary.csv", "wb")
w = csv.DictWriter(Out,dict_clusters.keys())
w.writerow(dict_clusters)
Out.close()

the version of python that I am using is the following:

3.5.2 |Anaconda 4.2.0 (64-bit)| (default, Jul  5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]
3.5.2

After trying a lot of times I decided to use a better way to build my dictionary as follows:

from collections import defaultdict
pairs = zip(y_pred, listComments)

dict_clusters2 = defaultdict(list)

for num, comment in pairs:
    dict_clusters2[num].append(comment)

However it seems that some character is making fail the creation of the csv file as follows:

with open('dict.csv', 'w') as csv_file:
    writer = csv.writer(csv_file)
    for key, value in dict_clusters2.items():
       writer.writerow([key, value])

output:

Traceback (most recent call last):
  File "C:/Users/CleanFile.py", line 146, in <module>
    writer.writerow([key, value])
  File "C:\Program Files\Anaconda3\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f609' in position 6056: character maps to <undefined>

In order to be more clear I performed:

for k,v in dict_clusters2.items():
    print(k, v)

And I got something like:

1 ['hello this is','the car is red',....'performing test']
2 ['we already have','another comment',...'strings strings']
.
.
19 ['we have',' comment music',...'strings strings dance']

My dictionary has a key and a list of several comments I would like to have a csv as follows:

1,'hello this is','the car is red',....'performing test'
2,'we already have','another comment',...'strings strings'
.
.
19,'we have',' comment music',...'strings strings dance'

however seems that some character is not well maped and everything fails, I would like to receive support thanks for the support.

The writerows method must take a list of dictionaries:

Out = open("dictionary.csv", "wb")
w = csv.DictWriter(Out,dict_clusters.keys())
w.writerows([dict_clusters])
Out.close()

You're probably looking for writerow which takes a single dictionary object:

Out = open("dictionary.csv", "wb")
w = csv.DictWriter(Out,dict_clusters.keys())
w.writerow(dict_clusters)
Out.close()

Aside: you might also want to consider using open as a context manager (in a with block) to ensure the file is properly closed:

with open("dictionary.csv", mode="wb") as out_file:
    writer = csv.DictWriter(out_file, headers=dict_clusters.keys())
    writer.writerow(dict_clusters)

Your special character, in a Py3 Ipython session renders as:

In [31]:  '\U0001f609'
Out[31]: '😉'

Give us a small sample of the dictionary, or better yet the values that you use to build it.

I haven't worked with csv much, and csv.DictWriter even less. numpy users often write csv files with np.savetxt . That's easy to use when writing a purely numeric array. If you want to write a mix of character and numeric columns, it is tricker, requiring the use of a structured array.

Another option is to simply write a text file directly. Just open it, and use f.write(...) to write a formatted line to the file. In fact np.savetxt does essentially that:

with open(filename, 'w') as f:
    for row in myArray:
       f.write(fmt % tuple(row))

savetxt constructs a fmt string like %s, %d, %f\\n . It also works with bytestrings, requiring a wb mode. And as such could have even more problems with your special character.

It might help to focus on printing your dictionary, one key at a time, eg

for k in mydict.keys():
   print(`%s, %s`%(k, mydict[k]))

for a start. Once you get the print format right, it is easy to convert that to a file write.

===============

I can write a hypothetical dictionary with your code:

In [58]: adict={1:'\U0001f609'}
In [59]: with open('test.txt','w') as f:
    ...:     writer=csv.writer(f)
    ...:     for k,v in adict.items():
    ...:         writer.writerow([k,v])
    ...:         
In [60]: cat test.txt
1,😉

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM