簡體   English   中英

numpy - 2D 和 3D 數組中的有效值計數

[英]numpy - efficient value counts in 2D and 3D arrays

我正在編寫一個小組游戲的調度程序。 我有一個適用於 32-4-8(32 名球員,每組 4 名球員,8 輪)的時間表,沒有重復的伙伴或對手。 但由於場地限制,每輪只能有28名玩家/7組參賽。 所以我必須修改賽程,讓每個球員都有 7 場比賽,1 次輪空,並且盡可能少地重復搭檔或對手。

import numpy as np

sched = np.array([
      [[ 3, 28, 17, 14],
        [23, 30, 22,  1],
        [ 2,  5, 27, 25],
        [20,  8, 10, 16],
        [ 0, 24, 26, 11],
        [ 4, 21, 31,  7],
        [19,  6, 29, 15],
        [13, 18, 12,  9]],

       [[20, 15, 24, 31],
        [ 3, 21, 16, 13],
        [ 6, 30,  4,  5],
        [28,  8,  0,  7],
        [25, 29, 17, 23],
        [14,  9,  2, 22],
        [27, 12,  1, 11],
        [26, 10, 19, 18]],

       [[10,  4, 23, 12],
        [ 9, 28, 25, 31],
        [ 5, 13, 22,  8],
        [15,  7, 30,  2],
        [16, 19, 11, 14],
        [18, 17, 24,  6],
        [21,  0, 27, 20],
        [ 3, 26, 29,  1]],

       [[18, 20, 28,  1],
        [ 8,  9,  3,  4],
        [12, 17, 31,  5],
        [13, 30, 27, 14],
        [19, 25, 24,  7],
        [ 2,  6, 21, 26],
        [10, 11, 29, 22],
        [15, 23,  0, 16]],

       [[22, 21, 25, 15],
        [26, 12, 20, 14],
        [28,  5, 24, 10],
        [11,  6, 31, 13],
        [23, 27,  7,  3],
        [ 0, 19,  9,  1],
        [18, 30,  8, 29],
        [16, 17,  2,  4]],

       [[29, 28, 12, 21],
        [ 9, 16, 27,  6],
        [19, 17, 20, 30],
        [ 2,  8, 24, 23],
        [ 5, 11, 18,  7],
        [26, 13, 25,  4],
        [ 1, 10, 15, 14],
        [ 0, 22, 31,  3]],

       [[31, 19, 27,  8],
        [20,  5, 29,  2],
        [24, 16, 22, 12],
        [25,  3, 10,  6],
        [17,  1,  7, 13],
        [ 4,  0, 14, 18],
        [23, 28, 26, 15],
        [11, 21,  9, 30]],

       [[31, 18,  1, 16],
        [23, 14, 21,  5],
        [ 8,  3, 11, 15],
        [26, 17,  9, 10],
        [30, 12, 25,  0],
        [22, 20,  7,  6],
        [27,  4, 29, 24],
        [13, 19, 28,  2]]
])

為了確定最佳再見選項,我從每一輪比賽中隨機選擇了一場比賽作為再見。 然后,我為每個輪空選擇分配一個分數,以最大限度地增加只有 1 個輪空的玩家數量,以最大限度地減少對時間表的必要更改。

def bincount2d(arr, bins=None):
    if bins is None:
        bins = np.max(arr) + 1
    count = np.zeros(shape=[len(arr), bins], dtype=np.int64)
    indexing = np.arange(len(arr))
    for col in arr.T:
        count[indexing, col] += 1
    return count


# randomly sample one game per round as byes
# repeat n times (here 10000)
times = 10000
idx1 = np.tile(np.arange(sched.shape[0]), times)
idx2 = np.random.randint(sched.shape[1], size=sched.shape[0] * times)
population_byes = sched[idx1, idx2].reshape(times, sched.shape[1], sched.shape[2])

# get player counts for byes
# can reshape because interested in # of byes for entire schedule
# so no need to segment players by rounds for these counts
count_shape = (population_byes.shape[0], population_byes.shape[1] * population_byes.shape[2])
counts = bincount2d(population_byes.reshape(count_shape))

# fitness is the number of players with one bye
# the higher the value, the less we need to do to mess with the schedule
fitness = np.apply_along_axis(lambda x: (x == 1).sum(), 1, counts)
byes = population_byes[np.argmax(fitness)]

我的問題如下:

(1) 有沒有一種有效的方法來解釋沒有計數的值(我知道索引應該是從 0 到 31)? bincount2d 沒有該范圍內缺失值的值。

(2) 是否有比 np.apply_along_axis 線更有效的矢量化方法來使元素計數等於 1?

(3) 最終,我想做的是讓應用程序更改時間表,通過交換玩家分配來讓每個人都再見。 如何交換 3D 數組中的元素?

(1) 有沒有一種有效的方法來解釋沒有計數的值(我知道索引應該是從 0 到 31)? bincount2d 沒有該范圍內缺失值的值。

bincount2d效率低下,因為它執行的內存訪問效率低下。 事實上,轉置是一項昂貴的操作,尤其是當它像 Numpy 那樣懶惰地完成時。 此外,循環也效率不高,因為它適用於具有隨機內存訪問的相當大的陣列,這對CPU 緩存不利。 話雖如此,Numpy 並不適合這樣的計算。 可以使用Numba來高效地實現操作:

import numba as nb

# You may need to tune the types on your machines
# Alternatively, you can use cache=True instead and let Numba find the types (which is slower the fist time)
@nb.njit('int64[:,::1](int64[:,::1], optional(int64))')
def bincount2d_fast(arr, bins=None):
    if bins is None:
        nbins = np.max(arr) + 1
    else:
        nbins = np.int64(bins)
    count = np.zeros((arr.shape[0], nbins), dtype=np.int64)
    for i in range(arr.shape[0]):
        for j in range(arr.shape[1]):
            count[i, arr[i, j]] += 1
    return count

上面的代碼比我機器上的原始bincount2d函數快 10 倍。

(2) 是否有比 np.apply_along_axis 線更有效的矢量化方法來使元素計數等於 1?

是的。 您可以對整個 2D 數組進行操作並在給定軸上執行縮減 下面是一個例子:

fitness = (counts == 1).sum(axis=1)
byes = population_byes[np.argmax(fitness)]
```

This is roughly 30 times faster on my machine.

> (3) Ultimately, what I would like to do is have the application change the schedule to give everyone a bye by swapping player assignments. How do you swap elements in a 3D array?

A straightforward solution is to use Numba again with plain loops. Another solution could be to save the value to swap in a temporary array and use an indirect access regarding your exact needs (like what @WholeBrain proposed). Something like:

```python
# all_x1, all_y1, etc. are 1D Numpy arrays containing coordinates of the items to swap
arr[all_x2, all_y2], arr[all_x1, all_y1] = arr[all_x1, all_y1], arr[all_x2, all_y2]
```

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM