简体   繁体   English

概念:使用Python将字典结果的2D矩阵写入CSV文件

[英]Conceptual: writing a 2D matrix of dictionary results to a CSV file in Python

I have dictionary formatted this way: the key is a tuple of the document number and the keyword, and the value is the frequency of the keyword within the document. 我已经用这种方式格式化了字典:键是文档编号和关键字的元组,值是文档中关键字的频率。 So, the keys would be (document1, keyword1), (document1, keyword2), (document1, keyword3), (document2, keyword1), (document2, keyword2), (document2, keyword3), (document3, keyword1), (document3, keyword2), and (document3, keyword3) and the value would be number. 因此,键将为(document1,keyword1),(document1,keyword2),(document1,keyword3),(document2,keyword1),(document2,keyword2),(document2,keyword3),(document3,keyword1),(document3 ,关键字2)和(文档3,关键字3),则该值为数字。 Of course this is a small dictionary. 当然这是一本小字典。 I am hoping to apply the solution to a large set of documents and keywords. 我希望将解决方案应用于大量文档和关键字。

The dictionary was created as such: 字典是这样创建的:

document_count = {}
try:
    for doc in document_id_list:
        indiv_doc = # records selected from a database
        for w in words:
            document_count.setdefault((doc, w), 0)
            for entry in #unsorted list of text tokenized, set to lower case, and stripped of stop words:
                if entry == w and (doc, entry) in document_count:
                        document_count[(patent, entry)] += 1
    return document_count

except Exception, e:
    print "create claim storages"
    print str(e)
    pass

I wanted to write the results to a CSV like a 2D matrix. 我想将结果写入2D矩阵等CSV文件中。 At least, that is how I have seen it described. 至少,这就是我所看到的描述。

      keyword1 keyword2 keyword3
document1 number   number   number
document2 number   number   number 
document3 number   number   number

When looking at the CSV function docs on python.org and other questions on this site, the closest that I have gotten to is this: 当查看python.org上的CSV函数文档以及此站点上的其他问题时,我最接近的是:

document1 keyword1 number
document1 keyword2 number
document1 keyword3 number
document2 keyword1 number
document2 keyword2 number
document2 keyword3 number
document3 keyword1 number
document3 keyword2 number
document3 keyword3 number 

This is the result from code that I have written: 这是我编写的代码的结果:

 with open(os.path.join('C:/Users/Tara/PyCharmProjects/untitled/csv_results/', file_name),
                    'wb') as csvfile:
   w = csv.writer(csvfile)
   for key, value in available_dict.items():
       separate_keys = list(key)
       w.writerow([separate_keys[0], separate_keys[1], value])

I noticed that a lot of solutions involve list comprehension, but I do not know what the correct if statement would be. 我注意到很多解决方案都涉及列表理解,但是我不知道正确的if语句是什么。 Would I make the changes when I write a dictionary, or when I write to the CSV file? 写字典或写CSV文件时会进行更改吗?

Many existing python libraries handle the task of writing a csv file, so I assume that you only want to use simple python statements and structures. 许多现有的python库都处理编写csv文件的任务,因此我假设您只想使用简单的python语句和结构。

The main strategy below is to write a generator function to create the rows of the csv file. 下面的主要策略是编写一个生成器函数来创建csv文件的行。 To do this, the function first extracts and sorts the documents and the keywords from the dictionary, then a header row is generated containing the keywords, then each document's row is created and generated 为此,该函数首先从字典中提取文档和关键字并对其进行排序,然后生成包含关键字的标题行,然后创建并生成每个文档的行

I'm using a minimal number of lists comprehensions, which could easily be avoided if you are ready to write a few more lines 我使用的列表理解次数最少,如果您准备多写几行,就可以轻松避免

D = {
    ('doc1', 'key1'): 2, ('doc1', 'key2'): 3, ('doc1', 'key3'): 4,
    ('doc2', 'key1'): 4, ('doc2', 'key2'): 6, ('doc2', 'key3'): 8,
    ('doc3', 'key1'): 6, ('doc3', 'key2'): 9, ('doc3', 'key3'): 12,
}

def gen_rows(D):
    sorted_docs = sorted(set(t[0] for t in D))
    sorted_kwds = sorted(set(t[1] for t in D))
    yield [None,] + sorted_kwds
    for d in sorted_docs:
        yield [d,] + [D.get((d, k), 0) for k in sorted_kwds]

for row in gen_rows(D):
    print(row)

Here is the output, a list of rows ready to be written in a csv file 这是输出,准备好写入csv文件的行列表

[None, 'key1', 'key2', 'key3']
['doc1', 2, 3, 4]
['doc2', 4, 6, 8]
['doc3', 6, 9, 12]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM