简体   繁体   English

Numpy的Python列表理解

[英]Python list comprehension for Numpy

I'm looking for list-comprehension method or similar in Numpy to eliminate use of a for-loop eg. 我正在寻找Numpy中的列表理解方法或类似方法来消除for循环的使用,例如。 index_values is a Python dictionary list of lists (each list containing a different number of index values) and s is a numpy vector: index_values是列表的Python字典列表(每个列表包含不同数量的索引值),s是一个numpy向量:

for i in range(33):
    s[index_values[i]] += 4.1

Is there a method available that allows eliminating the for-loop? 有没有可用的方法可以消除for循环?

I don't fully understand what kind of object index_values is. 我不完全了解index_values是什么类型的对象。 But if it were an ndarray , or could be converted to an ndarray , you could just do this: 但如果它是一个ndarray ,或者可以转换为ndarray ,你可以这样做:

>>> s = numpy.arange(20)
>>> index_values = (numpy.random.random((3, 3)) * 20).astype('i')
>>> s[index_values] = 4
>>> s
array([ 0,  1,  4,  4,  4,  5,  6,  4,  8,  4,  4, 11, 12, 
       13,  4, 15,  4,  4,  4, 19])

Edit: But it seems that won't work in this case. 编辑:但似乎在这种情况下不起作用。 On the basis of your edits and comments, here's a method I think might work for you. 根据您的编辑和评论,这是我认为可能适合您的方法。 A random list of lists with varying lengths... 随机列表的长度不同......

>>> index_values = [list(range(x, x + random.randrange(1, 5)))
...                 for x in [random.randrange(0,50) for y in range(33)]]

...isn't hard to convert into an array: ...转换成数组并不难:

>>> index_value_array = numpy.fromiter(itertools.chain(*index_values), 
                                       dtype='i')

If you know the length of the array, specify the count for better performance: 如果您知道数组的长度,请指定count以获得更好的性能:

>>> index_value_array = numpy.fromiter(itertools.chain(*index_values), 
                                       dtype='i', count=83)

Since your edit indicates that you want histogram-like behavior, simple indexing won't do, as pointed out by Robert Kern. 由于您的编辑表明您想要类似于直方图的行为,因此简单的索引不会这样做,正如Robert Kern所指出的那样。 So use numpy.histogram : 所以使用numpy.histogram

>>> hist = numpy.histogram(index_value_array, bins=range(0, 51))

histogram is really constructed for floating point histograms. histogram实际上是为浮点直方图构造的。 This means that bins has to be a bit larger than expected because the last value is included in the last bin, and so 48 and 49 would be in the same bin if we used the more intuitive range(0, 50) . 这意味着垃圾箱必须比预期的要大一些,因为最后一个垃圾箱中包含了最后一个值,因此如果我们使用更直观的range(0, 50)则48和49将位于同一个垃圾箱中。 The result is a tuple with an array of n counts and an array of n + 1 bin borders: 结果是一个包含n个计数数组和n + 1个 bin边界数组的元组:

>>> hist
(array([2, 2, 1, 2, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 5, 5, 5, 3, 3, 
        3, 3, 3, 2, 1, 0, 2, 3, 3, 1, 0, 2, 3, 2, 2, 2, 3, 2, 1, 1, 2, 2, 
        2, 0, 0, 0, 1, 0]), 
 array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
        17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
        34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]))

Now we can scale the counts up by a factor of 4.1 and perform vector addition: 现在我们可以将计数增加4.1倍并执行向量加法:

>>> s = numpy.arange(50, dtype='f')
>>> hist[0] * 4.1 + s
array([  8.2,   9.2,   6.1,  11.2,   8.1,   5. ,   6. ,   7. ,  12.1,
        13.1,  14.1,  15.1,  16.1,  13. ,  18.1,  19.1,  20.1,  37.5,
        38.5,  39.5,  32.3,  33.3,  34.3,  35.3,  36.3,  33.2,  30.1,
        27. ,  36.2,  41.3,  42.3,  35.1,  32. ,  41.2,  46.3,  43.2,
        44.2,  45.2,  50.3,  47.2,  44.1,  45.1,  50.2,  51.2,  52.2,
        45. ,  46. ,  47. ,  52.1,  49. ])

I have no idea if this suits your purposes, but it seems like a good approach, and should probably happen at near c speed since it uses only numpy and itertools . 我不知道这是否适合你的目的,但它似乎是一个很好的方法,并且应该以接近c的速度发生,因为它只使用numpyitertools

关于什么:

s[reduce(lambda x,y: x+y, [index_values[x] for x in range(33)], [])] = 4.1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM