比 numpy.where 更節省內存的選項？

Question

我有一個大數組（幾百萬個元素），我需要根據幾個不同的標准切出其中的一小部分（幾百個）。 我目前正在使用 np.where，沿着以下路線：

for threshold in np.arange(0,1,.1):
    x=np.random.random(5000000)
    y=np.random.random(5000000)
    z=np.random.random(5000000)
    inds=np.where((x < threshold) & (y > threshold) & (z > threshold) & (z < threshold+0.1))

DoSomeJunk(a[inds], b[inds], c[inds])

然后使用 ipts 從各種數組中提取正確的點。 但是，我在 np.where 行上收到 MemoryError 。 我在其他幾個相關帖子中看到 np.where 可能是內存占用和復制數據。

有多個 & 是否意味着數據被多次復制？ 有沒有一種更有效的方式來切片數據，以減少內存密集度，同時保留我想要的索引列表，以便我以后可以在多個地方使用同一個切片？

請注意，我發布的這個示例實際上並沒有產生錯誤，但結構與我所擁有的相似。

Answer 1

在每個條件中，您都在創建一個與x 、 y和z大小相同的臨時布爾數組。 為了優化這一點，您可以迭代地創建掩碼：

for threshold in np.arange(0,1,.1):
    x=np.random.random(5000000)
    y=np.random.random(5000000)
    z=np.random.random(5000000)
    inds = x < threshold
    inds &= y > threshold
    inds &= z > threshold
    inds &= z < threshold+0.1

DoSomeJunk(a[inds], b[inds], c[inds])

對於此示例，這會將內存使用量從 160 MB 減少到 40 MB。

比 numpy.where 更節省內存的選項？

問題描述

1 個解決方案

解決方案1
2 已采納 2019-02-27 17:50:51

比 numpy.where 更節省內存的選項？

問題描述

1 個解決方案

解決方案1 2 已采納 2019-02-27 17:50:51

解決方案1
2 已采納 2019-02-27 17:50:51