[英]Selecting DataFrame values based on column of indices in list
我创建了一个代码,它根据另一列中的索引列表获取 df 的值:
import numpy as np
import pandas as pd
d = {'myvalues': [11, 13, 0, -1, 10, 14], 'neighbours': [[1,2],[0,2,3],[0,1,3],[1,2,4],[3,5],[4]]}
df = pd.DataFrame(data=d)
df['neighboring_idxs'] = df['neighbours']+pd.Series(([[x] for x in df.index.values]))
df['neighboring_myvalues'] = df.apply(lambda row: df.myvalues.values[row.neighboring_idxs], axis=1)
结果是:
myvalues neighbours neighboring_idxs neighboring_myvalues
0 11 [1, 2] [1, 2, 0] [13, 0, 11]
1 13 [0, 2, 3] [0, 2, 3, 1] [11, 0, -1, 13]
2 0 [0, 1, 3] [0, 1, 3, 2] [11, 13, -1, 0]
3 -1 [1, 2, 4] [1, 2, 4, 3] [13, 0, 10, -1]
4 10 [3, 5] [3, 5, 4] [-1, 14, 10]
5 14 [4] [4, 5] [10, 14]
然而,在大型数据集上使用apply
确实非常耗时。 有没有更聪明的方法来实现相同的df['neighboring_myvalues']
,而不使用apply
?
我不知道它是否更快,但您可以尝试扩展您的列表:
df['neighboring_myvalues'] = (
df.explode('neighboring_idxs').reset_index()
.assign(vals=lambda x: df.loc[x['neighboring_idxs'], 'myvalues'].tolist())
.groupby('index')['vals'].agg(list)
)
Output:
>>> df
myvalues neighbours neighboring_idxs neighboring_myvalues
0 11 [1, 2] [1, 2, 0] [13, 0, 11]
1 13 [0, 2, 3] [0, 2, 3, 1] [11, 0, -1, 13]
2 0 [0, 1, 3] [0, 1, 3, 2] [11, 13, -1, 0]
3 -1 [1, 2, 4] [1, 2, 4, 3] [13, 0, 10, -1]
4 10 [3, 5] [3, 5, 4] [-1, 14, 10]
5 14 [4] [4, 5] [10, 14]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.