简体   繁体   English

在python的字典中创建字典,读取一些数据是csv文件

[英]creating dictionary within a dictionary in python reading some data is a csv file

I have a csv file called sample.csv which contains the following data:- 我有一个名为sample.csv的csv文件,其中包含以下数据:-

2014-07-18 01:00:00,UNKNOWN,UNKNOWN,UNKNOWN,UNKNOWN,10002
2014-07-18 01:00:00,UNKNOWN,UNKNOWN,UNKNOWN,UNKNOWN,10003
2014-07-18 01:15:00,UNKNOWN,UNKNOWN,UNKNOWN,UNKNOWN,10004
2014-07-18 01:15:00,UNKNOWN,UNKNOWN,UNKNOWN,UNKNOWN,10005
2014-07-18 01:30:00,UNKNOWN,UNKNOWN,UNKNOWN,UNKNOWN,10006
2014-07-18 01:30:00,UNKNOWN,UNKNOWN,UNKNOWN,UNKNOWN,10007
2014-07-18 01:45:00,UNKNOWN,UNKNOWN,UNKNOWN,UNKNOWN,10008
2014-07-18 01:45:00,UNKNOWN,UNKNOWN,UNKNOWN,UNKNOWN,10009

I am trying to write a python script that would read all the lines in this csv file and I want it to read the hour which is '01' here and make the hour as the head key and then make the minute as the subkey and the remaining fields as its values. 我正在尝试编写一个python脚本,该脚本将读取此csv文件中的所有行,并且我希望它读取此处的小时(即“ 01”),并将小时作为起始键,然后将分钟作为子键,其余字段作为其值。

Here is my code snippet:- 这是我的代码段:

def connection():
        os.chdir("record_output/")
        mydict = {}
        for files in glob.glob("*.csv"):
                fo = open(files, "r")
                data = fo.readlines()
                for lines in data:
                        lines = lines.split(',')
                        dateObject = datetime.strptime(lines[0],"%Y-%m-%d %H:%M:%S")
                        hour = dateObject.hour
                        minute = dateObject.minute
                        fields = lines[1:]

Here I get the hour, the minute and the remaining fields but kinda struggling in creating the desired output that is making hour as the head key, the minute as the subkey and the corresponding fields as the values and so on for every minute in that hour that is '00', '15', '30' and '45' and for every hour. 在这里,我得到了小时,分钟和其余字段,但是在创建所需的输出时有些费力,该输出使小时作为主键,将分钟作为子键,并将对应的字段作为值,依此类推每小时分别是“ 00”,“ 15”,“ 30”和“ 45”。 Currently there is only 1 hour in this csv file, in future there can be more than 1 hour. 目前,此csv文件只有1小时,将来可能会超过1小时。

Check if key exists before insert a new value (this will be inside your las loop) 在插入新值之前检查键是否存在(这将在las循环内)

if not mydict.has_key(hour):
    mydict[hour] = {}
mydict[hour][minute]=fields

Unchecked but should work. 未经检查,但应该可以。 Should let you something like: 应该让你这样:

{1: {33: 22, 34: 25}}

Been 1 the hour, 33 and 34 the minutes and 22, 25 the values (can be strings or whatever) 是1小时,33和34分钟,以及22、25值(可以是字符串或其他值)

EDIT: True, the minutes must be arrays to store many values in the same minute, so do the same with minutes, letting it this way: 编辑:是的,分钟数必须是数组,以便在同一分钟内存储许多值,因此对分钟数也是如此,以这种方式进行:

if not mydict.has_key(hour):
    mydict[hour] = {}

if not mydict[hour].has_key(minute):
    # A list here, cause you don't have more keys
    mydict[hour][minute] = [] 

mydict[hour][minute].append(fields)

So the output should be this way: 所以输出应该是这样的:

{1: {33: ["a value, undefinde,...", 22, "test1"], 34: [33, "test2", "test945723"]}}

EDIT II: The final code will be: 编辑II:最终代码将是:

def connection():
        os.chdir("record_output/")
        mydict = {}
        for files in glob.glob("*.csv"):
                fo = open(files, "r")
                data = fo.readlines()
                for lines in data:
                        lines = lines.split(',')
                        dateObject = datetime.strptime(lines[0],"%Y-%m-%d %H:%M:%S")
                        hour = dateObject.hour
                        minute = dateObject.minute
                        fields = lines[1:]

                        if not mydict.has_key(hour):
                                mydict[hour] = {}

                        if not mydict[hour].has_key(minute):
                                # A list here, cause you don't have more keys
                                mydict[hour][minute] = [] 

                        mydict[hour][minute].append(fields)

If don't works check your loop: 如果不起作用,请检查循环:

fo = open(files, "r")
  data = fo.readlines()
  for lines in data:
    print lines

And try changing: 并尝试更改:

  for row in data:
    lines = row.split(',')

with a new field called 有一个名为

  for row in data:
    lines = row.split(',')

And put 'print's to debug the program. 并放置“打印”来调试程序。

A solution with csv module CSV模块的解决方案

import dateutil.parser
import csv

data_dict = {}
with open('data.csv', 'r') as csvfile:
    csvreader = csv.reader(csvfile)
    for row in csvreader:
        dt = dateutil.parser.parse(row[0])
        if not data_dict.get(dt.hour,''):
            data_dict[dt.hour] = {}
        if not data_dict[dt.hour].get(dt.minute,''):
            data_dict[dt.hour][dt.minute]=[]
        data_dict[dt.hour][dt.minute].append(row[1:])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM