简体   繁体   English

python dict到numpy结构化数组

[英]python dict to numpy structured array

I have a dictionary that I need to convert to a NumPy structured array.我有一本需要转换为 NumPy 结构化数组的字典。 I'm using the arcpy function NumPyArraytoTable , so a NumPy structured array is the only data format that will work.我正在使用 arcpy 函数NumPyArraytoTable ,因此 NumPy 结构化数组是唯一NumPyArraytoTable的数据格式。

Based on this thread: Writing to numpy array from dictionary and this thread: How to convert Python dictionary object to numpy array基于此线程: Writing to numpy array from dictionary和此线程: How to convert Python dictionary object to numpy array

I've tried this:我试过这个:

result = {0: 1.1181753789488595, 1: 0.5566080288678394, 2: 0.4718269778030734, 3: 0.48716683119447185, 4: 1.0, 5: 0.1395076201641266, 6: 0.20941558441558442}

names = ['id','data']
formats = ['f8','f8']
dtype = dict(names = names, formats=formats)
array=numpy.array([[key,val] for (key,val) in result.iteritems()],dtype)

But I keep getting expected a readable buffer object但我一直expected a readable buffer object

The method below works, but is stupid and obviously won't work for real data.下面的方法有效,但很愚蠢,显然不适用于真实数据。 I know there is a more graceful approach, I just can't figure it out.我知道有一种更优雅的方法,我只是想不通。

totable = numpy.array([[key,val] for (key,val) in result.iteritems()])
array=numpy.array([(totable[0,0],totable[0,1]),(totable[1,0],totable[1,1])],dtype)

You could use np.array(list(result.items()), dtype=dtype) :您可以使用np.array(list(result.items()), dtype=dtype)

import numpy as np
result = {0: 1.1181753789488595, 1: 0.5566080288678394, 2: 0.4718269778030734, 3: 0.48716683119447185, 4: 1.0, 5: 0.1395076201641266, 6: 0.20941558441558442}

names = ['id','data']
formats = ['f8','f8']
dtype = dict(names = names, formats=formats)
array = np.array(list(result.items()), dtype=dtype)

print(repr(array))

yields产量

array([(0.0, 1.1181753789488595), (1.0, 0.5566080288678394),
       (2.0, 0.4718269778030734), (3.0, 0.48716683119447185), (4.0, 1.0),
       (5.0, 0.1395076201641266), (6.0, 0.20941558441558442)], 
      dtype=[('id', '<f8'), ('data', '<f8')])

If you don't want to create the intermediate list of tuples, list(result.items()) , then you could instead use np.fromiter :如果您不想创建元组的中间列表list(result.items()) ,那么您可以改用np.fromiter

In Python2:在 Python2 中:

array = np.fromiter(result.iteritems(), dtype=dtype, count=len(result))

In Python3:在 Python3 中:

array = np.fromiter(result.items(), dtype=dtype, count=len(result))

Why using the list [key,val] does not work:为什么使用列表[key,val]不起作用:

By the way, your attempt,顺便说一句,你的尝试,

numpy.array([[key,val] for (key,val) in result.iteritems()],dtype)

was very close to working.非常接近工作。 If you change the list [key, val] to the tuple (key, val) , then it would have worked.如果您将列表[key, val]更改为元组(key, val) ,那么它会起作用。 Of course,当然,

numpy.array([(key,val) for (key,val) in result.iteritems()], dtype)

is the same thing as是一样的

numpy.array(result.items(), dtype)

in Python2, or在 Python2 中,或

numpy.array(list(result.items()), dtype)

in Python3.在 Python3 中。


np.array treats lists differently than tuples: Robert Kern explains : np.array对待列表的方式与元组不同: Robert Kern 解释说

As a rule, tuples are considered "scalar" records and lists are recursed upon.通常,元组被认为是“标量”记录并且列表被递归。 This rule helps numpy.array() figure out which sequences are records and which are other sequences to be recursed upon;这个规则帮助 numpy.array() 找出哪些序列是记录,哪些是要递归的其他序列; ie which sequences create another dimension and which are the atomic elements.即哪些序列创建另一个维度,哪些是原子元素。

Since (0.0, 1.1181753789488595) is considered one of those atomic elements, it should be a tuple, not a list.由于(0.0, 1.1181753789488595)被认为是这些原子元素之一,它应该是一个元组,而不是一个列表。

Even more simple if you accept using pandas :如果您接受使用 pandas 则更简单:

import pandas
result = {0: 1.1181753789488595, 1: 0.5566080288678394, 2: 0.4718269778030734, 3: 0.48716683119447185, 4: 1.0, 5: 0.1395076201641266, 6: 0.20941558441558442}
df = pandas.DataFrame(result, index=[0])
print df

gives :给出:

          0         1         2         3  4         5         6
0  1.118175  0.556608  0.471827  0.487167  1  0.139508  0.209416

Let me propose an improved method when the values of the dictionnary are lists with the same lenght :当字典的值是具有相同长度的列表时,让我提出一种改进的方法:

import numpy

def dctToNdarray (dd, szFormat = 'f8'):
    '''
    Convert a 'rectangular' dictionnary to numpy NdArray
    entry 
        dd : dictionnary (same len of list 
    retrun
        data : numpy NdArray 
    '''
    names = dd.keys()
    firstKey = dd.keys()[0]
    formats = [szFormat]*len(names)
    dtype = dict(names = names, formats=formats)
    values = [tuple(dd[k][0] for k in dd.keys())]
    data = numpy.array(values, dtype=dtype)
    for i in range(1,len(dd[firstKey])) :
        values = [tuple(dd[k][i] for k in dd.keys())]
        data_tmp = numpy.array(values, dtype=dtype)
        data = numpy.concatenate((data,data_tmp))
    return data

dd = {'a':[1,2.05,25.48],'b':[2,1.07,9],'c':[3,3.01,6.14]}
data = dctToNdarray(dd)
print data.dtype.names
print data

I would prefer storing keys and values on separate arrays.我更喜欢将键和值存储在单独的数组上。 This i often more practical.这我往往更实用。 Structures of arrays are perfect replacement to array of structures.数组结构是结构数组的完美替代品。 As most of the time you have to process only a subset of your data (in this cases keys or values, operation only with only one of the two arrays would be more efficient than operating with half of the two arrays together.由于大多数情况下您只需要处理数据的一个子集(在这种情况下,键或值,仅使用两个数组中的一个进行操作比将两个数组中的一半放在一起操作更有效。

But in case this way is not possible, I would suggest to use arrays sorted by column instead of by row.但如果这种方式是不可能的,我建议使用按列而不是按行排序的数组。 In this way you would have the same benefit as having two arrays, but packed only in one.通过这种方式,您将获得与拥有两个数组相同的好处,但只打包在一个中。

import numpy as np
result = {0: 1.1181753789488595, 1: 0.5566080288678394, 2: 0.4718269778030734, 3: 0.48716683119447185, 4: 1.0, 5: 0.1395076201641266, 6: 0.20941558441558442}

names = 0
values = 1
array = np.empty(shape=(2, len(result)), dtype=float)
array[names] = result.keys()
array[values] = result.values()

But my favorite is this (simpler):但我最喜欢的是这个(更简单):

import numpy as np
result = {0: 1.1181753789488595, 1: 0.5566080288678394, 2: 0.4718269778030734, 3: 0.48716683119447185, 4: 1.0, 5: 0.1395076201641266, 6: 0.20941558441558442}

arrays = {'names': np.array(result.keys(), dtype=float),
          'values': np.array(result.values(), dtype=float)}

Similarly to the approved answer.与批准的答案类似。 If you want to create an array from dictionary keys:如果要从字典键创建数组:

np.array( tuple(dict.keys()) )

If you want to create an array from dictionary values:如果要从字典值创建数组:

np.array( tuple(dict.values()) )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM