简体   繁体   English

如何将 Python Defaultdict 中的值转换为 Numpy 数组?

[英]How to convert the values in a Python Defaultdict to a Numpy array?

I want multiple values to belong to the same key, so I used a Python defaultdict to walk around this.我希望多个值属于同一个键,所以我使用了 Python defaultdict 来解决这个问题。 However, since now the values in the defaultdict are nested lists, how do I make each element of the nested lists a row of a Numpy ndarray?但是,由于现在 defaultdict 中的值是嵌套列表,如何使嵌套列表的每个元素成为 Numpy ndarray 的一行?

Let's say my defaultdict looks like this:假设我的 defaultdict 看起来像这样:

my_dict = defaultdict(list)

*** in some for loop *** 
 my_dict[key].append(value) # key is a string and value is a Numpy array of shape (1,10)
*** end of the for loop ***

I guess the slowest way would be using a nested for loop like:我想最慢的方法是使用嵌套的 for 循环,例如:

data = np.empty((0,10),np.uint8)
for i in my_dict:
    for j in my_dict[i]:
        data = np.append(data,j,axis=0)   

is there a faster way to do this?有没有更快的方法来做到这一点?

Instead of using defaultdict(list) use setdefault functionality, this will spare you from the nested list而不是使用defaultdict(list)使用setdefault功能,这将使您免于嵌套列表

my_dict = dict()
for key, value in values:
    my_dict[key] = np.append(my_dict.setdefault(key, value), value)

data = np.array(list(my_dict.values()))

You should have provided an example, but I think the following is as general as your code implies.您应该提供一个示例,但我认为以下内容与您的代码所暗示的一样普遍。

In [131]: from collections import defaultdict
In [132]: dd = defaultdict(list)
In [133]: dd[1].append(np.ones((1,5),int))
In [134]: dd[2].append(2*np.ones((1,5),int))
In [135]: dd[1].append(3*np.ones((1,5),int))

In [136]: dd
Out[136]: 
defaultdict(list,
            {1: [array([[1, 1, 1, 1, 1]]), array([[3, 3, 3, 3, 3]])],
             2: [array([[2, 2, 2, 2, 2]])]})

Several suggested making array from:几个建议从以下位置制作数组:

In [137]: list(dd.values())
Out[137]: 
[[array([[1, 1, 1, 1, 1]]), array([[3, 3, 3, 3, 3]])],
 [array([[2, 2, 2, 2, 2]])]]

But with the possibility that there is more than one array in each list, that won't work.但是,如果每个列表中有多个数组,那是行不通的。

We can flatten the nested lies with something similar to your code, but with a faster list append:我们可以用类似于您的代码的东西来展平嵌套的谎言,但使用更快的列表 append:

In [140]: alist = []
     ...: for i in dd:
     ...:     for a in dd[i]:
     ...:         alist.append(a)
     ...:         
In [141]: alist
Out[141]: [array([[1, 1, 1, 1, 1]]), array([[3, 3, 3, 3, 3]]), array([[2, 2, 2, 2, 2]])]

We can make a 2d array from this (provided the subarrays match in shape):我们可以从中创建一个二维数组(前提是子数组的形状匹配):

In [142]: np.vstack(alist)
Out[142]: 
array([[1, 1, 1, 1, 1],
       [3, 3, 3, 3, 3],
       [2, 2, 2, 2, 2]])

or:要么:

In [144]: np.array(alist).shape
Out[144]: (3, 1, 5)

As a general rule, repeated np.append is inefficient.作为一般规则,重复np.append是低效的。 list append (or a list comprehension) is best when iteration is unavoidable.当迭代不可避免时,列表 append(或列表理解)是最好的。

Guy's伙计们

Trying to recreate the dict with @Guy's suggestion:尝试根据@Guy 的建议重新创建字典:

In [147]: my_dict = dict()
     ...: key,value=(1,np.ones((1,5),int)); my_dict[key]= np.append(my_dict.setdefault(key, value), value)

I would prefer to use np.hstack here ( np.append is misused too often).我更愿意在这里使用np.hstacknp.append被滥用得太频繁了)。

In [148]: key,value=(2,2*np.ones((1,5),int)); my_dict[key]= np.append(my_dict.setdefault(key, value), value)    
In [149]: key,value=(1,3*np.ones((1,5),int)); my_dict[key]= np.append(my_dict.setdefault(key, value), value)

In [150]: my_dict
Out[150]: 
{1: array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3]),
 2: array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2])}

This has duplicated values for some of the additions.对于某些添加,这具有重复的值。 And making an array from list(my_dict.values()) is no easier.list(my_dict.values())制作一个数组并不容易。

We could collect the dict values as arrays, but it's not a simple as with lists.我们可以将 dict 值收集为 arrays,但这并不像列表那样简单。 Array doesn't have a simple "empty", and doesn't have an inplace "append".数组没有简单的“空”,也没有就地“追加”。

In [157]: dd = defaultdict(lambda: np.zeros([0,5],int))
In [158]: dd[1]=np.vstack((dd[1],(np.ones((1,5),int))))
In [159]: dd[2]=np.vstack((dd[2],(2*np.ones((1,5),int))))
In [160]: dd[3]=np.vstack((dd[3],(3*np.ones((1,5),int))))

In [161]: dd
Out[161]: 
defaultdict(<function __main__.<lambda>()>,
            {1: array([[1, 1, 1, 1, 1]]),
             2: array([[2, 2, 2, 2, 2]]),
             3: array([[3, 3, 3, 3, 3]])})

In [162]: np.vstack(list(dd.values()))
Out[162]: 
array([[1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2],
       [3, 3, 3, 3, 3]])

This avoids an iteration after the dict is constructed, but the dict construction is more complex and slower.这样就避免了dict构造完成后的一次迭代,但是dict的构造更复杂也更慢。 So I don't think it helps.所以我认为这没有帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM