简体   繁体   English

根据csv文件的内容创建JSON /文本文件

[英]Create a JSON/text file based on contents of a csv file

I am trying to loop through a csv file (approximately 91 million records) and create a new json/text file using a Python dict based on the sample records below (File is sorted on id,type). 我试图遍历一个csv文件(大约9100万条记录),并根据以下示例记录使用Python dict创建一个新的json /文本文件(文件按id,type排序)。

id,type,value
4678367,1,1001
4678367,2,1007
4678367,2,1008
5678945,1,9000
5678945,2,8000

The code should append values when it matches id and type else create a new record as below. 该代码应在与ID匹配时附加值,然后键入,否则创建如下所示的新记录。 I would like to write this to a target file 我想将此写入目标文件

How can I do this in Python? 如何在Python中执行此操作?

{'id':4678367,
 'id_1':[1001],
 'id_2':[1007,1008]
},
{'id':5678945,
 'id_1':[9000],
 'id_2':[8000]
}

Here is one way to collect the items. 这是一种收集物品的方法。 I have left the writing to a file as an exercise: 我已将练习的内容保留在文件中:

Code: 码:

with open('test.csv') as f:
    reader = csv.reader(f)
    columns = next(reader)
    results = []
    record = {}
    current_type = 0
    items = []
    for id_, type, value in reader:
        if current_type != type:
            if current_type:
                record['id_{}'.format(current_type)] = items
                items = []
            current_type = type

        if id_ != record.get('id'):
            if record:
                results.append(record)
            record = dict(id=id_)

        items.append(value)

    if record:
        record['id_{}'.format(current_type)] = items
        results.append(record)

print(results)

Results: 结果:

[
    {'id': '4678367', 'id_1': ['1001'], 'id_2': ['1007', '1008']}, 
    {'id': '5678945', 'id_1': ['9000'], 'id_2': ['8000']}
]
import csv
from collections import namedtuple

with open("data.csv","r") as f:
    read = csv.reader(f)
    header = next(read)
    col = namedtuple('col',header)
    dictionary = {}
    for values in read:
        data = col(*values)
        type_ = 'id_' + str(data.type)
        if data.id in dictionary:
            local_dict = dictionary[data.id]                
            if type_ in local_dict:
                local_dict[type_].append(data.value)
            else:
                local_dict[type_] = [data.value]
        else:
            dictionary.setdefault(data.id,{'id':data.id,type_:[data.value]})
print(*dictionary.values(),sep="\n")
>>>{'id': '4678367', 'id_1': ['1001'], 'id_2': ['1007', '1008']}
   {'id': '5678945', 'id_1': ['9000'], 'id_2': ['8000']}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM