繁体   English   中英

根据列表中的索引列选择 DataFrame 值

[英]Selecting DataFrame values based on column of indices in list

我创建了一个代码,它根据另一列中的索引列表获取 df 的值:

import numpy as np
import pandas as pd

d = {'myvalues': [11, 13, 0, -1, 10, 14], 'neighbours': [[1,2],[0,2,3],[0,1,3],[1,2,4],[3,5],[4]]}
df = pd.DataFrame(data=d)

df['neighboring_idxs'] = df['neighbours']+pd.Series(([[x] for x in df.index.values])) 
df['neighboring_myvalues'] = df.apply(lambda row: df.myvalues.values[row.neighboring_idxs], axis=1)

结果是:

   myvalues neighbours neighboring_idxs neighboring_myvalues
0        11     [1, 2]        [1, 2, 0]          [13, 0, 11]
1        13  [0, 2, 3]     [0, 2, 3, 1]      [11, 0, -1, 13]
2         0  [0, 1, 3]     [0, 1, 3, 2]      [11, 13, -1, 0]
3        -1  [1, 2, 4]     [1, 2, 4, 3]      [13, 0, 10, -1]
4        10     [3, 5]        [3, 5, 4]         [-1, 14, 10]
5        14        [4]           [4, 5]             [10, 14]

然而,在大型数据集上使用apply确实非常耗时。 有没有更聪明的方法来实现相同的df['neighboring_myvalues'] ,而不使用apply

我不知道它是否更快,但您可以尝试扩展您的列表:

df['neighboring_myvalues'] = (
    df.explode('neighboring_idxs').reset_index()
      .assign(vals=lambda x: df.loc[x['neighboring_idxs'], 'myvalues'].tolist())
      .groupby('index')['vals'].agg(list)
)

Output:

>>> df
   myvalues neighbours neighboring_idxs neighboring_myvalues
0        11     [1, 2]        [1, 2, 0]          [13, 0, 11]
1        13  [0, 2, 3]     [0, 2, 3, 1]      [11, 0, -1, 13]
2         0  [0, 1, 3]     [0, 1, 3, 2]      [11, 13, -1, 0]
3        -1  [1, 2, 4]     [1, 2, 4, 3]      [13, 0, 10, -1]
4        10     [3, 5]        [3, 5, 4]         [-1, 14, 10]
5        14        [4]           [4, 5]             [10, 14]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM