Python中CSV文件中的二维字典，列表或数组

Question

I am very new to python... 我对python很陌生...

I am trying to read a regression coefficient matrix into python from a csv file of the format below: 我正在尝试从以下格式的csv文件中将回归系数矩阵读入python：

 0.10 0.15 0.20 0.25 0.30 0.35 
a1 -0.0011 0.0008 0.0019 0.0034 0.0067 0.0047-0.0026 
a2 0.0134 -0.3042 -0.2531 -0.2138 -1.2345 -0.2380 2.0402 
a3 0.0546 0.2708 0.1738 0.0810 0.8451 -0.0034 -1.4961 
a4 -0.0226 -0.0052 -0.0021 -0.0024 -0.0023 -0.0745 0.0563 
a5 -0.0101 0.0108 0.0153 0.0263 0.0491 0.0327 -0.0691

I would need to be able to access a specific element of this matrix, for example a['a1','0.10']=-0.0011. 我将需要能够访问此矩阵的特定元素，例如a ['a1'，'0.10'] =-0.0011。 I thought that a dict would be suitable to store this data, but find it hard to make it two-dimensional. 我认为一个字典适合存储此数据，但是很难将其二维化。

I have managed to read this data to a dictionary, with top row elements as a key, but I do not know how to accomplish the double keying that I want. 我已经设法将这些数据读取到字典中，并以顶行元素作为键，但是我不知道如何完成我想要的双键。 The code I used is below: 我使用的代码如下：

import csv, sys

reader = csv.DictReader(open(sys.path[0]+"\\DSYHScoeff_98.dat", 'r'), delimiter=' ')

result = {}
for row in reader:
    for column, value in row.iteritems():
        result.setdefault(column, []).append(value)

Do you have any suggestion of a good method to deal with this data? 您是否有处理这种数据的好方法的建议？

Best regards, Adam 最好的问候，亚当

Answer 1

Go with pandas , its designed for this stuff: 与pandas ，它专门为这种东西而设计：

>>> import pandas as pd
>>> names = ['0.10', '0.15', '0.20', '0.25', '0.30', '0.35', '0.40']
>>> i = pd.read_csv('test.csv', delim_whitespace=True, names=names)
>>> i
     0.10    0.15    0.20    0.25    0.30    0.35    0.40
0 -0.0011  0.0008  0.0019  0.0034  0.0067  0.0047 -0.0026
1  0.0134 -0.3042 -0.2531 -0.2138 -1.2345 -0.2380  2.0402
2  0.0546  0.2708  0.1738  0.0810  0.8451 -0.0034 -1.4961
3 -0.0226 -0.0052 -0.0021 -0.0024 -0.0023 -0.0745  0.0563
4 -0.0101  0.0108  0.0153  0.0263  0.0491  0.0327 -0.0691
>>> i['0.10'][0]
-0.0011000000000000001

Answer 2

Honestly, I'd do it manually. 老实说，我会手动进行。

header,data = None,dict()
with open("filename.csv") as f:
   for line in f:
      if header is None:
         header = line.split()
         continue
      l = line.split()
      for i in xrange(len(l)-1):
         data[l[0],header[i]] = l[i+1]

Works once I made the adjustments tobias_k also mentioned in their comment. 在我对tobias_k进行调整后，他们的评论中也提到了该方法。

Answer 3

What I would do is probably append someting like "ax" at beginning of the file : 我可能要做的是在文件的开头添加诸如“ ax”之类的内容：

ax 0.10 0.15 0.20 0.25 0.30 0.35 
a1 -0.0011 0.0008 0.0019 0.0034 0.0067 0.0047 -0.0026 
[...]

And then slightly change your code: 然后稍微更改您的代码：

result = {}
for row in reader:
    x = row.pop(reader.fieldnames[0])
    for column, value in row.iteritems():
        if column and value:
            y = float(column)
            result[x,y] = float(value)

It should work: 它应该工作：

>>> result['a3',0.15]
0.2708

Answer 4

You have first to add a label to your first column: 您必须首先在第一列中添加标签：

# ▼▼▼
  row 0.10 0.15 0.20 0.25 0.30 0.35 
  a1 -0.0011 0.0008 0.0019 0.0034 0.0067 0.0047-0.0026 
  a2 0.0134 -0.3042 -0.2531 -0.2138 -1.2345 -0.2380 2.0402 
# [...]

After that, this is only a question of getting the row index in the "row column". 之后，这只是在“行列”中获取行索引的问题。 Wrapped in a function: 包装功能：

def cell(arr,row,col):
    try:
        return result[col][result['row'].index(row)]
    except KeyError:
        return "N/A"

Given your input file -- and your code: 给定您的输入文件-和您的代码：

#
# insert your code here
#

from pprint import pprint
pprint(result)

def cell(arr,row,col):
    try:
        return result[col][result['row'].index(row)]
    except KeyError:
        return "N/A"

pprint(cell(result, 'a1', '0.10'))
pprint(cell(result, 'a1', '0.14'))

Producing: 生产：

{None: [[''], [''], [''], ['']],
 '': ['', '2.0402', '-1.4961', '0.0563', '-0.0691'],
 '0.10': ['-0.0011', '0.0134', '0.0546', '-0.0226', '-0.0101'],
 '0.15': ['0.0008', '-0.3042', '0.2708', '-0.0052', '0.0108'],
 '0.20': ['0.0019', '-0.2531', '0.1738', '-0.0021', '0.0153'],
 '0.25': ['0.0034', '-0.2138', '0.0810', '-0.0024', '0.0263'],
 '0.30': ['0.0067', '-1.2345', '0.8451', '-0.0023', '0.0491'],
 '0.35': ['0.0047-0.0026', '-0.2380', '-0.0034', '-0.0745', '0.0327'],
 'row': ['a1', 'a2', 'a3', 'a4', 'a5']}
'-0.0011'
'N/A'

(please notice your input data file is probably not well formed; this is pretty obvious the the pprint 'ed dictionary -- see your question comments for details) （请注意，您的输入数据文件的格式可能不正确 ；这很明显是pprint版的字典-有关详细信息，请参见问题注释）

Python中CSV文件中的二维字典，列表或数组

问题描述

4 个解决方案

解决方案1
3 2014-08-28 11:39:34

解决方案2
0 2014-08-28 11:33:52

解决方案3
0 2014-08-28 11:37:59

解决方案4
0 2014-08-28 11:45:53

Python中CSV文件中的二维字典，列表或数组

问题描述

4 个解决方案

解决方案1 3 2014-08-28 11:39:34

解决方案2 0 2014-08-28 11:33:52

解决方案3 0 2014-08-28 11:37:59

解决方案4 0 2014-08-28 11:45:53

解决方案1
3 2014-08-28 11:39:34

解决方案2
0 2014-08-28 11:33:52

解决方案3
0 2014-08-28 11:37:59

解决方案4
0 2014-08-28 11:45:53