简体   繁体   English

使嵌套'for'循环更加pythonic

[英]Making nested 'for' loops more pythonic

I'm relatively new to python and am wondering how to make the following more efficient by avoiding explicit nested 'for' loops and using python's implicit looping instead. 我是python的新手,我想知道如何通过避免显式嵌套'for'循环并使用python的隐式循环来提高效率。 I'm working with image data, and in this case trying to speed up my k-means algorithm. 我正在使用图像数据,在这种情况下尝试加速我的k-means算法。 Here's a sample of what I'm trying to do: 这是我正在尝试做的一个示例:

# shape of image will be something like 140, 150, 3
num_sets, rows_per_set, num_columns = image_values.shape

for set in range(0, num_sets):
    for row in range(0, rows_per_set):
        pos = np.argmin(calc_euclidean(rgb_[set][row], means_list)
        buckets[pos].append(image_values[set][row])

What I have today works great but I'd like to make it more efficient. 我今天所拥有的东西效果很好,但我想让它更有效率。

Feedback and recommendations are greatly appreciated. 非常感谢您的反馈和建议。

Here is a vectorised solution. 这是一个矢量化解决方案。 I'm almost certain I got your dimensions muddled up (3 is not really the number of columns, is it?), but the principle should be recognisable anyway: 我几乎可以肯定我的尺寸混乱(3不是真正的列数,不是吗?),但原则应该是可识别的:

For demonstration I only collect the (flat) indices into set and row in the buckets. 为了演示,我只将(平面)索引收集到桶中的集合和行中。

import numpy as np

k = 6
rgb_=np.random.randint(0, 9, (140, 150, 3))
means_list = np.random.randint(0, 9, (k, 3))

# compute distance table; use some algebra to leverage highly optimised
# dot product
squared_dists = np.add.outer((rgb_*rgb_).sum(axis=-1),
                             (means_list*means_list).sum(axis=-1)) \
    - 2*np.dot(rgb_, means_list.T)
# find best cluster
best = np.argmin(squared_dists, axis=-1)

# find group sizes
counts = np.bincount(best.ravel())
# translate to block boundaries
bnds = np.cumsum(counts[:-1])
# group indices by best cluster; argpartition should be
# a bit cheaper than argsort
chunks = np.argpartition(best.ravel(), bnds)
# split into buckets
buckets = np.split(chunks, bnds)

# check

num_sets, rows_per_set, num_columns = rgb_.shape

def calc_euclidean(a, b):
    return ((a-b)**2).sum(axis=-1)

for set in range(0, num_sets):
    for row in range(0, rows_per_set):
        pos = np.argmin(calc_euclidean(rgb_[set][row], means_list))
        assert pos == best[set, row]
        assert rows_per_set*set+row in buckets[pos]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM