根据列表中的索引列选择 DataFrame 值

Question

我创建了一个代码，它根据另一列中的索引列表获取 df 的值：

import numpy as np
import pandas as pd

d = {'myvalues': [11, 13, 0, -1, 10, 14], 'neighbours': [[1,2],[0,2,3],[0,1,3],[1,2,4],[3,5],[4]]}
df = pd.DataFrame(data=d)

df['neighboring_idxs'] = df['neighbours']+pd.Series(([[x] for x in df.index.values])) 
df['neighboring_myvalues'] = df.apply(lambda row: df.myvalues.values[row.neighboring_idxs], axis=1)

结果是：

   myvalues neighbours neighboring_idxs neighboring_myvalues
0        11     [1, 2]        [1, 2, 0]          [13, 0, 11]
1        13  [0, 2, 3]     [0, 2, 3, 1]      [11, 0, -1, 13]
2         0  [0, 1, 3]     [0, 1, 3, 2]      [11, 13, -1, 0]
3        -1  [1, 2, 4]     [1, 2, 4, 3]      [13, 0, 10, -1]
4        10     [3, 5]        [3, 5, 4]         [-1, 14, 10]
5        14        [4]           [4, 5]             [10, 14]

然而，在大型数据集上使用apply确实非常耗时。 有没有更聪明的方法来实现相同的df['neighboring_myvalues'] ，而不使用apply ？

Answer 1

我不知道它是否更快，但您可以尝试扩展您的列表：

df['neighboring_myvalues'] = (
    df.explode('neighboring_idxs').reset_index()
      .assign(vals=lambda x: df.loc[x['neighboring_idxs'], 'myvalues'].tolist())
      .groupby('index')['vals'].agg(list)
)

Output：

>>> df
   myvalues neighbours neighboring_idxs neighboring_myvalues
0        11     [1, 2]        [1, 2, 0]          [13, 0, 11]
1        13  [0, 2, 3]     [0, 2, 3, 1]      [11, 0, -1, 13]
2         0  [0, 1, 3]     [0, 1, 3, 2]      [11, 13, -1, 0]
3        -1  [1, 2, 4]     [1, 2, 4, 3]      [13, 0, 10, -1]
4        10     [3, 5]        [3, 5, 4]         [-1, 14, 10]
5        14        [4]           [4, 5]             [10, 14]

根据列表中的索引列选择 DataFrame 值

问题描述

1 个解决方案

解决方案1
0 2022-07-31 21:45:10

根据列表中的索引列选择 DataFrame 值

问题描述

1 个解决方案

解决方案1 0 2022-07-31 21:45:10

解决方案1
0 2022-07-31 21:45:10