简体   繁体   English

从嵌套字典到CSV文件

[英]From Nested Dictionary to CSV File

I have nested dictionary (with length > 70.000): 我有嵌套字典(长度> 70.000):

users_item = {
    "sessionId1": {
        "12345645647": 1.0, 
        "9798654": 5.0 

    },         
    "sessionId2":{
        "3445657657": 1.0

    },
    "sessionId3": {
        "87967976": 5.0, 
        "35325626436": 1.0, 
        "126789435": 1.0, 
        "72139856": 5.0      
    },
    "sessionId4": {
        "4582317": 1.0         
    }
......
}

I want create CSV file from my nested dictionary, my result will look like: 我想从我的嵌套字典创建CSV文件,我的结果将如下所示:

sessionId1 item rating
sessionId1 item rating
sessionId2 item rating
sessionId3 item rating
sessionId3 item rating
sessionId3 item rating
sessionId3 item rating
.......

I found this post: Convert Nested Dictionary to CSV Table 我发现这篇文章: 将嵌套字典转换为CSV表

It's similar to my question but it's not working when I try all answers, pandas library run out of memory 这与我的问题类似,但是当我尝试所有答案时,它不起作用, pandas库耗尽内存

How I can make CSV file with my data? 如何用我的数据制作CSV文件?

Just loop through the dictionary and use the Python csv writer to write to the csv file. 只需遍历字典并使用Python csv编写器写入csv文件。

with open('output.csv', 'w') as csv_file:
    csvwriter = csv.writer(csv_file, delimiter='\t')
    for session in users_item:
        for item in users_item[session]:
            csvwriter.writerow([session, item, users_item[session][item]])
for session, ratings in users_item.items():
    for rating, value in ratings.items():
        print("{} {}".format(session, value))

Output: 输出:

sessionId3 5.0
sessionId3 1.0
sessionId3 5.0
sessionId3 1.0
sessionId1 5.0
sessionId1 1.0
sessionId4 1.0
sessionId2 1.0

Note that a dict ( user_items ) has no order . 请注意dictuser_items没有订单 So unless you specify the order of rows using some other way, the ouput will be in the order the dict uses internally. 因此,除非您使用其他方式指定行的顺序,否则输出将按照dict内部使用的顺序排列。

Edit: This approach has no problems with a file containing 70k entries. 编辑:这种方法对包含70k条目的文件没有问题。

Edit: If you want to write to a CSV file, use the csv module or just pipe the output to a file. 编辑:如果要写入CSV文件,请使用csv模块或仅将输出通过管道传输到文件。

Assuming you want each session as a row, the number of columns for every row will be the total number of unique keys in all session dicts. 假设您希望每个会话都是一行,每行的列数将是所有会话序列中唯一键的总数。 Based on the data you've given, I'm guessing the number of unique keys are astronomical. 根据您提供的数据,我猜测唯一键的数量是天文数字。

That is why you're running into memory issues with the solution given in this discussion . 这就是为什么你在本讨论中给出的解决方案遇到了内存问题。 It's simply too much data to hold in memory at one time. 一次只能存储在内存中的数据太多了。

Your only option if my assumptions are correct are to divide and conquer. 如果我的假设是正确的,你唯一的选择是分而治之。 Break the data into smaller chunks and write them to a file in csv format. 将数据分成更小的块并将它们写入csv格式的文件中。 Then merge the csv files at the end. 然后在最后合并csv文件。

If you iteratively write the file, there should be no memory issues: 如果您迭代地写入文件,应该没有内存问题:

import csv

users_item = {
    "sessionId1": {
        "12345645647": 1.0,
        "9798654": 5.0

    },
    "sessionId2":{
        "3445657657": 1.0

    },
    "sessionId3": {
        "87967976": 5.0,
        "35325626436": 1.0,
        "126789435": 1.0,
        "72139856": 5.0
    },
    "sessionId4": {
        "4582317": 1.0
    }
}

with open('nested_dict.csv', 'w') as output:
    writer = csv.writer(output, delimiter='\t')
    for sessionId in sorted(users_item):
        ratings = users_item[sessionId]
        for item in ratings:
            writer.writerow([sessionId, item, ratings[item]])

Resulting contents of output file (where » represents a tab characters): 结果输出文件的内容(其中»表示制表符):

sessionId1»  12345645647»  1.0
sessionId1»  9798654»      5.0
sessionId2»  3445657657»   1.0
sessionId3»  126789435»    1.0
sessionId3»  87967976»     5.0
sessionId3»  35325626436»  1.0
sessionId3»  72139856»     5.0
sessionId4»  4582317»      1.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM