[英]Vectorized column creation, where each value is pulled from a different column in a multi-index pandas df
我有一个如下所示的多索引数据框:
import numpy as np
import pandas as pd
np.random.seed(1)
df = pd.DataFrame(
{
'trial': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2],
't': [0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3,0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3,0,0,0,0,1,1,1,1],
'context':[0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3],
'x' :np.random.rand(40),
'y' : np.random.rand(40),
'z' : np.random.rand(40),
'inferred_context': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1],
}
)
df = df.set_index(["trial", "t","context"])
df= df.unstack('context')
df.columns.set_names(['vals','context'],inplace=True)
df.swaplevel(0,1)
所以它有一个两级行(trial,t)和列(vals,context)索引。
我想创建一个名为“x_inferred_context”的列,其中包含每个试验和时间点 t 的“inferred_context”中指定的上下文的“x”值。
例如,如果在 (trial = 0, t = 3) 行,“inferred_context”=0 并且 (val = x, context=0) 处的“x”为 0.204452,则“x_inferred_context”列应包含 0.204452 at row (trial= 0,t=3)。 相应地,如果在 (trial=1,t=0) 行,“inferred_context”为 2,上下文 2 的“x”值为 0.140387,则 (1,0) 处的列“x_inferred_context”应包含 0.140387。
我通过为每个可能的上下文创建一个掩码然后将掩码乘以 df['x'] 并求和来实现这一点。
nc = np.unique((df.columns.get_level_values(level=1))).size
mask = pd.DataFrame( data = [(df['inferred_context'] == c).iloc[:,c] for c in range(nc)]).T
df['x_inferred_context'] = (mask*df['x']).sum(axis=1)
我是大熊猫的新手,想问一下正确的方法是什么? 有没有更简洁、更容易阅读和更像熊猫的方式来做到这一点?
干杯!
使用双索引查找:
idx, col = pd.factorize(df.index.get_level_values('t'))
idx2 = df['inferred_context'].reindex(col, axis=1).to_numpy()[np.arange(len(df)), idx]
df[('x_inferred_context', None)] = df['x'].reindex(col, axis=1).to_numpy()[np.arange(len(df)), idx2]
输出:
vals x y \
context 0.0 1.0 2.0 3.0 0.0 1.0 2.0
trial t
0 0 0.417022 0.720324 0.000114 0.302333 0.988861 0.748166 0.280444
1 0.146756 0.092339 0.186260 0.345561 0.103226 0.447894 0.908596
2 0.396767 0.538817 0.419195 0.685220 0.287775 0.130029 0.019367
3 0.204452 0.878117 0.027388 0.670468 0.211628 0.265547 0.491573
1 0 0.417305 0.558690 0.140387 0.198101 0.574118 0.146729 0.589306
1 0.800745 0.968262 0.313424 0.692323 0.102334 0.414056 0.694400
2 0.876389 0.894607 0.085044 0.039055 0.049953 0.535896 0.663795
3 0.169830 0.878143 0.098347 0.421108 0.944595 0.586555 0.903402
2 0 0.957890 0.533165 0.691877 0.315516 0.139276 0.807391 0.397677
1 0.686501 0.834626 0.018288 0.750144 0.927509 0.347766 0.750812
vals z inferred_context \
context 3.0 0.0 1.0 2.0 3.0 0.0
trial t
0 0 0.789279 0.883306 0.623672 0.750942 0.348898 0
1 0.293614 0.269928 0.895886 0.428091 0.964840 0
2 0.678836 0.663441 0.621696 0.114746 0.949489 0
3 0.053363 0.449912 0.578390 0.408137 0.237027 0
1 0 0.699758 0.903380 0.573679 0.002870 0.617145 2
1 0.414179 0.326645 0.527058 0.885942 0.357270 2
2 0.514889 0.908535 0.623360 0.015821 0.929437 2
3 0.137475 0.690897 0.997323 0.172341 0.137136 2
2 0 0.165354 0.932595 0.696818 0.066000 0.755463 1
1 0.725998 0.753876 0.923025 0.711525 0.124271 1
vals x_inferred_context
context 1.0 2.0 3.0 NaN
trial t
0 0 0 0 0 0.417022
1 0 0 0 0.146756
2 0 0 0 0.396767
3 0 0 0 0.204452
1 0 2 2 2 0.140387
1 2 2 2 0.313424
2 2 2 2 0.085044
3 2 2 2 0.098347
2 0 1 1 1 0.533165
1 1 1 1 0.834626
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.