將 numpy 數組轉換為字典的最有效方法

Question

我有 2 numpy arrays：

import numpy as np

a = np.array([1, 2, 3])
b = np.array([10, 20, 30])

我需要創建一個字典列表：

res = 
[{"a": 1, "b": 10},
 {"a": 2, "b": 20},
 {"a": 3, "b": 30}]

以最佳方式，無需遍歷整個數組。

顯而易見的解決方案

res = [{"a": a_el, "b": b_el} for a_el, b_el in zip(a, b)]

如果 a 和 b 內部有很多值，則需要太多時間

Answer 1

如果您也願意導入 pandas，您可以這樣做：

import pandas as pd

df = pd.DataFrame({"a": a, "b": b})
res = df.to_dict(orient='records')

這給出了所需的res ：

[{'a': 1, 'b': 10}, {'a': 2, 'b': 20}, {'a': 3, 'b': 30}]

~~根據 arrays 的大小，這可能不值得。~~ 無論您的 arrays 的大小如何，這似乎都不值得，但我將保留此答案的教育價值，並將更新它以比較其他人建議的方法的運行時間。

對這兩種方法進行計時，我的計算機顯示zip方法總是比 pandas 方法快，因此請忽略此答案的前一部分。

排名（從快到慢）

普通老zip
0x0fba 的np.col_stack
我的方法——創建一個 dataframe 和df.to_dict
= crashMOGWAI 的 numba 方法（第一次 function 調用的時間因編譯時間而有所偏差）

import timeit
from numba import jit
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt


def f_zip(a, b):
    return [{"a": ai, "b": bi} for ai, bi in zip(a, b)]

def f_pd(a, b):
    df = pd.DataFrame({"a": a, "b": b})
    return df.to_dict(orient='records')

def f_col_stack(a, b):
    return [{"a": a, "b": b} for a, b in np.column_stack((a,b))]

@jit
def f_numba(a, b):
    return [{"a": a_el, "b": b_el} for a_el, b_el in zip(a, b)]


funcs = [f_zip, f_pd, f_col_stack, f_numba]
sizes = [5, 10, 50, 100, 500, 1000, 5000, 10_000, 50_000, 100_000]
times = np.zeros((len(sizes), len(funcs)))

N = 20

for i, s in enumerate(sizes):
    a = np.random.random((s,))
    b = np.random.random((s,))
    for j, f in enumerate(funcs):
        times[i, j] = timeit.timeit("f(a, b)", globals=globals(), number=N) / N
        print(".", end="")
    print(s)
        
fig, ax = plt.subplots()
for j, f in enumerate(funcs):
    ax.plot(sizes, times[:, j], label=f.__name__)


ax.set_xlabel("Array size")
ax.set_ylabel("Time per function call (s)")
ax.set_xscale("log")
ax.set_yscale("log")
ax.legend()
ax.grid()
fig.tight_layout()

Answer 2

我會說你目前的方法是相當有效的。 在不知道任何其他細節的情況下，您可以使用 numba 進行預編譯並節省一些執行時間。 做出一些數量級和 memory 可用性假設，請參閱下面的 Jupyter 單元。

# %%
import numpy as np
from numba import jit

# %%
x = np.array(range(1,1000000,1))
y = np.array(range(10,1000000,10))
test = [{"a": a_el, "b": b_el} for a_el, b_el in zip(x, y)]

# %%
@jit
def f():
    a = np.array(range(1,1000000,1))
    b = np.array(range(10,1000000,10))
    return [{"a": a_el, "b": b_el} for a_el, b_el in zip(a, b)]

Answer 3

您可以試試這個使用列表推導來構建字典列表的單行代碼，以及 numpy column_stack column_stack()方法。

res = [{"a": a, "b": b} for a, b in np.column_stack((a,b))]

將 numpy 數組轉換為字典的最有效方法

問題描述

3 個解決方案

解決方案1
3 已采納 2022-12-06 15:53:58

解決方案2
1 2022-12-06 16:01:23

解決方案3
1 2022-12-06 16:04:15

將 numpy 數組轉換為字典的最有效方法

問題描述

3 個解決方案

解決方案1 3 已采納 2022-12-06 15:53:58

解決方案2 1 2022-12-06 16:01:23

解決方案3 1 2022-12-06 16:04:15

解決方案1
3 已采納 2022-12-06 15:53:58

解決方案2
1 2022-12-06 16:01:23

解決方案3
1 2022-12-06 16:04:15