需要加快 python 中 numpy arrays 上的操作

Question

我通過將 Cplex 作為庫導入 Python 來解決 integer 編程 model。 假設優化問題具有以下形式的約束(Ax = b) ： x0+x1+x1+x3 = 1

此約束中的 x 變量的索引是 0,1,1 和 3。它們存儲在一個列表中： indices=[0,1,1,3]這些變量的系數也存儲在另一個列表中coeff = [1,1,1,1]

Cplex 不能接受重復的索引，因此約束應如下所示：

x0+2x1+x3 = 1

所以這兩個列表應該像這樣更新：

indices=[0,1,3]
 coeff = [1,2,1]

我有這個 function 將索引和系數作為兩個 arguments 並給我操縱列表：

def manipulate(indices, coeff):
    u = np.unique(indices)
    sums  = { ui:np.sum([  coeff[c[0]] for c in np.argwhere(indices == ui) ]) for     ui in u }
    return list(sums.keys()),list(sums.values())

因此， manipulate([0,1,1,3], [1,1,1,1])返回([0, 1, 3], [1, 2, 1]) 。

我的問題是當我有這么多變量時，列表的長度可能為一百萬，而我有數百萬個這樣的約束。 當我使用 cplex 解決我的優化問題時，程序變得非常慢。 我跟蹤了每個 function 所花費的時間，並意識到我的代碼中最耗時的部分是這些計算，我猜這是因為使用了 numpy。 我需要讓這個 function 更有效率，以希望減少執行時間。 您能否就如何更改 function 操作與我分享任何想法和建議？

非常感謝，

Answer 1

如果不使用基於本機代碼的擴展，可能總會有妥協：

Numpy / 矢量化方法錯過了基於哈希的算法，恕我直言，算法復雜性將受到影響（例如，需要排序；需要進行多次傳遞......）
基於 Python 的基於散列的方法將受到緩慢循環的影響。

盡管如此，我確實認為，您的方法在某種程度上接近了兩全其美的情況，您可以獲得一些東西。

一些代碼

from time import perf_counter as pc
from collections import defaultdict
import numpy as np
np.random.seed(0)

def manipulate(indices, coeff):
    u = np.unique(indices)
    sums  = { ui:np.sum([  coeff[c[0]] for c in np.argwhere(indices == ui) ]) for     ui in u }
    return list(sums.keys()),list(sums.values())

# this assumes pre-sorted indices
def manipulate_alt(indices, coeff):
  unique, indices = np.unique(indices, return_index=True)
  cum_coeffs = np.cumsum(coeff)
  bla = cum_coeffs[indices-1]
  blub = np.roll(bla, -1)
  bluab = np.ediff1d(blub, to_begin=blub[0])

  return unique.tolist(), bluab.tolist()

def manipulate_pure_py(indices, coeff):
  final = defaultdict(int)
  n = len(indices)
  for i in range(n):
    final[indices[i]] += coeff[i]

  return list(final.keys()), list(final.values())

# BAD NON-SCIENTIFIC BENCHMARK
# ----------------------------

ITERATIONS = 10
SIZE = 1000000

accu_time_manipulate = 0.0
accu_time_manipulate_alt = 0.0
accu_time_manipulate_py = 0.0

for i in range(ITERATIONS):
  indices = np.sort(np.random.randint(0, 10000, size=SIZE))
  coeffs = np.random.randint(1, 100, size=SIZE)

  start = pc()
  sol = manipulate(indices, coeffs)
  end = pc()
  accu_time_manipulate += (end-start)

  start = pc()
  sol_alt = manipulate_alt(indices, coeffs)
  end = pc()
  accu_time_manipulate_alt += (end-start)

  start = pc()
  sol_py = manipulate_pure_py(indices, coeffs)
  end = pc()
  accu_time_manipulate_py += (end-start)

  assert sol[0] == sol_alt[0]
  assert sol[1] == sol_alt[1]

  assert sol[0] == sol_py[0]
  assert sol[1] == sol_py[1]


print(accu_time_manipulate)
print(accu_time_manipulate_alt)
print(accu_time_manipulate_py)

計時

164.34614480000005
0.24998690000001744
8.751806900000059

需要加快 python 中 numpy arrays 上的操作

問題描述

1 個解決方案

解決方案1
1 已采納 2021-01-20 18:46:36

一些代碼

計時

評論

需要加快 python 中 numpy arrays 上的操作

問題描述

1 個解決方案

解決方案1 1 已采納 2021-01-20 18:46:36

一些代碼

計時

評論

解決方案1
1 已采納 2021-01-20 18:46:36