向量化 Numpy 切片操作

Question

假設我有一個 Numpy 向量，

A = zeros(100)

我通過索引到A的斷點列表將它划分為子向量，例如，

breaks = linspace(0, 100, 11, dtype=int)

因此，第i個子向量將位於索引breaks[i] （包括）和breaks[i+1] （不包括）之間。 中斷不一定是等距的，這只是一個例子。 但是，它們將始終嚴格增加。

現在我想對這些子向量進行操作。 例如，如果我想將第i個子向量的所有元素設置為i ，我可能會這樣做：

for i in range(len(breaks) - 1):
    A[breaks[i] : breaks[i+1]] = i

或者我可能想計算子向量意味着：

b = empty(len(breaks) - 1)
for i in range(len(breaks) - 1):
    b = A[breaks[i] : breaks[i+1]].mean()

等等。

如何避免使用for循環，而是將這些操作矢量化？

Answer 1

您可以使用簡單的np.cumsum -

import numpy as np

# Form zeros array of same size as input array and 
# place ones at positions where intervals change
A1 = np.zeros_like(A)
A1[breaks[1:-1]] = 1

# Perform cumsum along it to create a staircase like array, as the final output
out = A1.cumsum()

樣品運行 -

In [115]: A
Out[115]: array([3, 8, 0, 4, 6, 4, 8, 0, 2, 7, 4, 9, 3, 7, 3, 8, 6, 7, 1, 6])

In [116]: breaks
Out[116]: array([ 0,  4,  9, 11, 18, 20])

In [142]: out
Out[142]: array([0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4]..)

如果你想從A獲得這些子向量的平均值，你可以使用np.bincount -

mean_vals = np.bincount(out, weights=A)/np.bincount(out)

如果您希望擴展此功能並改用自定義函數，您可能需要查看 MATLAB 與Python/Numpy等效的accumarray ： numpy_groupies ，其源代碼可在此處獲得。

Answer 2

您的問題確實沒有一個答案，但是您可以使用幾種技術作為構建塊。 另一個你可能會覺得有幫助的：

所有 numpy ufunc 都有一個.reduceat方法，您可以利用它來進行一些計算：

>>> a = np.arange(100)
>>> breaks = np.linspace(0, 100, 11, dtype=np.intp)
>>> counts = np.diff(breaks)
>>> counts
array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10])
>>> sums = np.add.reduceat(a, breaks[:-1], dtype=np.float)
>>> sums
array([  45.,  145.,  245.,  345.,  445.,  545.,  645.,  745.,  845.,  945.])
>>> sums / counts  # i.e. the mean
array([  4.5,  14.5,  24.5,  34.5,  44.5,  54.5,  64.5,  74.5,  84.5,  94.5])

Answer 3

你可以使用np.repeat ：

In [35]: np.repeat(np.arange(0, len(breaks)-1), np.diff(breaks))
Out[35]: 
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4,
       4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6,
       6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9,
       9, 9, 9, 9, 9, 9, 9, 9])

要計算任意分箱統計信息，您可以使用scipy.stats.binned_statistic ：

import numpy as np
import scipy.stats as stats

breaks = np.linspace(0, 100, 11, dtype=int)
A = np.random.random(100)

means, bin_edges, binnumber = stats.binned_statistic(
    x=np.arange(len(A)), values=A, statistic='mean', bins=breaks)

stats.binned_statistic可以計算均值、中位數、計數、總和； 或者，要計算每個 bin 的任意統計信息，您可以將 callable 傳遞給statistic參數：

def func(values):
    return values.mean()

funcmeans, bin_edges, binnumber = stats.binned_statistic(
    x=np.arange(len(A)), values=A, statistic=func, bins=breaks)

assert np.allclose(means, funcmeans)

向量化 Numpy 切片操作

問題描述

3 個解決方案

解決方案1
7 2015-04-27 11:41:26

解決方案2
6 已采納 2015-04-27 13:35:50

解決方案3
3 2015-04-27 11:32:15

向量化 Numpy 切片操作

問題描述

3 個解決方案

解決方案1 7 2015-04-27 11:41:26

解決方案2 6 已采納 2015-04-27 13:35:50

解決方案3 3 2015-04-27 11:32:15

解決方案1
7 2015-04-27 11:41:26

解決方案2
6 已采納 2015-04-27 13:35:50

解決方案3
3 2015-04-27 11:32:15