如何使用列名数组有效地读取 pandas

Question

df = pd.DataFrame({"col_a": [1,2,3], "col_b": [5,4,0], "col_c": [9,7,6])
cols = [["col_a", "col_b"],["col_c", "col_b"],["col_a", "col_b"]]

#expected output:[[1,5],[7,4],[3,0]]

我知道这可以使用列表理解来实现，寻找更有效的方法，因为我有超过一百万条记录

Answer 1

您忘记提供的列表理解：

In [27]: [row[1][col].to_list() for row, col in zip(df.iterrows(), cols)]
Out[27]: [[1, 5], [7, 4], [3, 0]]

Answer 2

我认为如果不迭代cols变量，您将无法做到这一点。 尝试这个 -

[df.loc[i,j].tolist() for i,j in enumerate(cols)]

[[1, 5], [7, 4], [3, 0]]

Answer 3

你可以 map 你的标签到索引然后take_along_axis

d = {c: i for i,c in enumerate(df.columns)}
idx = pd.DataFrame(cols).replace(d).to_numpy()
#array([[0, 1],
#       [2, 1],
#       [0, 1]])

np.take_along_axis(df.to_numpy(), idx, axis=1)
#array([[1, 5],
#       [7, 4],
#       [3, 0]])

如何使用列名数组有效地读取 pandas

问题描述

3 个解决方案

解决方案1
1 2021-03-15 20:33:14

解决方案2
0 2021-03-15 17:33:58

解决方案3
0 2021-03-15 17:59:10

如何使用列名数组有效地读取 pandas

问题描述

3 个解决方案

解决方案1 1 2021-03-15 20:33:14

解决方案2 0 2021-03-15 17:33:58

解决方案3 0 2021-03-15 17:59:10

解决方案1
1 2021-03-15 20:33:14

解决方案2
0 2021-03-15 17:33:58

解决方案3
0 2021-03-15 17:59:10