向量化列创建，其中每个值都是从多索引 pandas df 中的不同列中提取的

Question

我有一个如下所示的多索引数据框：

import numpy as np
import pandas as pd


np.random.seed(1)
df = pd.DataFrame(
    {
        'trial':  [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2],
        't':      [0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3,0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3,0,0,0,0,1,1,1,1],
        'context':[0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3],
        'x' :np.random.rand(40),
        'y' : np.random.rand(40),
        'z' : np.random.rand(40),
        'inferred_context':   [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1],
    }
)

df = df.set_index(["trial", "t","context"])
df= df.unstack('context')
df.columns.set_names(['vals','context'],inplace=True)
df.swaplevel(0,1)

所以它有一个两级行(trial,t)和列(vals,context)索引。

我想创建一个名为“x_inferred_context”的列，其中包含每个试验和时间点 t 的“inferred_context”中指定的上下文的“x”值。

例如，如果在 (trial = 0, t = 3) 行，“inferred_context”=0 并且 (val = x, context=0) 处的“x”为 0.204452，则“x_inferred_context”列应包含 0.204452 at row (trial= 0，t=3）。 相应地，如果在 (trial=1,t=0) 行，“inferred_context”为 2，上下文 2 的“x”值为 0.140387，则 (1,0) 处的列“x_inferred_context”应包含 0.140387。

我通过为每个可能的上下文创建一个掩码然后将掩码乘以 df['x'] 并求和来实现这一点。

nc = np.unique((df.columns.get_level_values(level=1))).size
mask = pd.DataFrame( data = [(df['inferred_context'] == c).iloc[:,c] for c in range(nc)]).T
df['x_inferred_context'] = (mask*df['x']).sum(axis=1)

我是大熊猫的新手，想问一下正确的方法是什么？ 有没有更简洁、更容易阅读和更像熊猫的方式来做到这一点？

干杯!

Answer 1

使用双索引查找：

idx, col = pd.factorize(df.index.get_level_values('t'))

idx2 = df['inferred_context'].reindex(col, axis=1).to_numpy()[np.arange(len(df)), idx]

df[('x_inferred_context', None)] = df['x'].reindex(col, axis=1).to_numpy()[np.arange(len(df)), idx2]

输出：

vals            x                                       y                      \
context       0.0       1.0       2.0       3.0       0.0       1.0       2.0   
trial t                                                                         
0     0  0.417022  0.720324  0.000114  0.302333  0.988861  0.748166  0.280444   
      1  0.146756  0.092339  0.186260  0.345561  0.103226  0.447894  0.908596   
      2  0.396767  0.538817  0.419195  0.685220  0.287775  0.130029  0.019367   
      3  0.204452  0.878117  0.027388  0.670468  0.211628  0.265547  0.491573   
1     0  0.417305  0.558690  0.140387  0.198101  0.574118  0.146729  0.589306   
      1  0.800745  0.968262  0.313424  0.692323  0.102334  0.414056  0.694400   
      2  0.876389  0.894607  0.085044  0.039055  0.049953  0.535896  0.663795   
      3  0.169830  0.878143  0.098347  0.421108  0.944595  0.586555  0.903402   
2     0  0.957890  0.533165  0.691877  0.315516  0.139276  0.807391  0.397677   
      1  0.686501  0.834626  0.018288  0.750144  0.927509  0.347766  0.750812   

vals                      z                               inferred_context  \
context       3.0       0.0       1.0       2.0       3.0              0.0   
trial t                                                                      
0     0  0.789279  0.883306  0.623672  0.750942  0.348898                0   
      1  0.293614  0.269928  0.895886  0.428091  0.964840                0   
      2  0.678836  0.663441  0.621696  0.114746  0.949489                0   
      3  0.053363  0.449912  0.578390  0.408137  0.237027                0   
1     0  0.699758  0.903380  0.573679  0.002870  0.617145                2   
      1  0.414179  0.326645  0.527058  0.885942  0.357270                2   
      2  0.514889  0.908535  0.623360  0.015821  0.929437                2   
      3  0.137475  0.690897  0.997323  0.172341  0.137136                2   
2     0  0.165354  0.932595  0.696818  0.066000  0.755463                1   
      1  0.725998  0.753876  0.923025  0.711525  0.124271                1   

vals                x_inferred_context  
context 1.0 2.0 3.0                NaN  
trial t                                 
0     0   0   0   0           0.417022  
      1   0   0   0           0.146756  
      2   0   0   0           0.396767  
      3   0   0   0           0.204452  
1     0   2   2   2           0.140387  
      1   2   2   2           0.313424  
      2   2   2   2           0.085044  
      3   2   2   2           0.098347  
2     0   1   1   1           0.533165  
      1   1   1   1           0.834626

向量化列创建，其中每个值都是从多索引 pandas df 中的不同列中提取的

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-12-18 11:17:04

向量化列创建，其中每个值都是从多索引 pandas df 中的不同列中提取的

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-12-18 11:17:04

解决方案1
0 已采纳 2022-12-18 11:17:04