Pandas 索引系列子集（数据帧的）不改变值

Question

我有下表：

df = pd.DataFrame(({'code':['A121','A121','A121','H812','H812','H812','Z198','Z198','Z198','S222','S222','S222'],
                        'mode':['stk','sup','cons','stk','sup','cons','stk','sup','cons','stk','sup','cons'],
                        datetime.date(year=2021,month=5,day=1):[4,2,np.nan,2,2,np.nan,6,np.nan,np.nan,np.nan,2,np.nan],
                        datetime.date(year=2021,month=5,day=2):[1,np.nan,np.nan,3,np.nan,np.nan,2,np.nan,np.nan,np.nan,np.nan,np.nan],
                        datetime.date(year=2021,month=5,day=3):[12,5,np.nan,13,5,np.nan,12,np.nan,np.nan,np.nan,5,np.nan],
                        datetime.date(year=2021,month=5,day=4):[np.nan,1,np.nan,np.nan,4,np.nan,np.nan,np.nan,np.nan,np.nan,7,np.nan]}))
df = df.set_index('mode')

我想实现以下目标，我希望根据一些算术计算设置cons的行：

cons对应的日期和代码需要设置为如下计算prev_date stk - current_date stk + sup

我试过下面的代码：

dates = list(df.columns)
dates.remove('code')
for date in dates:
    prev_date = date - datetime.timedelta(days=1)
    if(df.loc["stk"].get(prev_date,None) is not None):
        opn_stk = df.loc["stk",prev_date].reset_index(drop=True)
        cls_stk = df.loc["stk",date].reset_index(drop=True)
        sup = df.loc["sup",date].fillna(0).reset_index(drop=True)
        cons = opn_stk - cls_stk + sup
        df.loc["cons",date] = cons

我没有收到任何错误，但是cons值根本没有改变。

我怀疑这可能是因为df.loc["cons",date]是一个索引系列，而计算opn_stk - cls_stk + sup是一个未索引系列。 知道如何解决这个问题吗？

PS另外我正在使用循环来计算这个，有没有其他更有效的矢量化方式

预计 Output

Answer 1

让我们尝试一个 groupby apply 代替：

def calc_cons(g):
    # Transpose
    t = g[g.columns[g.columns != 'code']].T
    # Update Cons
    g.loc[g.index == 'cons', g.columns != 'code'] = (-t['stk'].diff() +
                                                     t['sup'].fillna(0)).to_numpy()
    return g


df = df.groupby('code', as_index=False, sort=False).apply(calc_cons)
# print(df[df.index == 'cons'])
print(df)

      code  2021-05-01  2021-05-02  2021-05-03  2021-05-04
mode                                                      
stk   A121         4.0         1.0        12.0         NaN
sup   A121         2.0         NaN         5.0         1.0
cons  A121         NaN         3.0        -6.0         NaN
stk   H812         2.0         3.0        13.0         NaN
sup   H812         2.0         NaN         5.0         4.0
cons  H812         NaN        -1.0        -5.0         NaN
stk   Z198         6.0         2.0        12.0         NaN
sup   Z198         NaN         NaN         NaN         NaN
cons  Z198         NaN         4.0       -10.0         NaN
stk   S222         NaN         NaN         NaN         NaN
sup   S222         2.0         NaN         5.0         7.0
cons  S222         NaN         NaN         NaN         NaN

*假设列按日期以 1 天为间隔进行排序。

Answer 2

虽然@Henry Ecker 的回答非常优雅，但与我所做的相比它非常慢（慢了 10 倍以上），所以我想在我的实现修复的情况下提前 go

我的实现按照亨利埃克的建议df.loc["cons",date] = cons.to_numpy()

dates = list(df.columns)
dates.remove('code')
for date in dates:
    prev_date = date - datetime.timedelta(days=1)
    if(df.loc["stk"].get(prev_date,None) is not None):
        opn_stk = df.loc["stk",prev_date].reset_index(drop=True)    # gets the stock of prev date
        cls_stk = df.loc["stk",date].reset_index(drop=True)         # gets the stock of current date
        sup = df.loc["sup",date].fillna(0).reset_index(drop=True)   # gets suplly of current date
        cons = opn_stk - cls_stk + sup
        df.loc["cons",date] = cons.to_numpy()

顺便说一句：我的实现在0:00:00.053309 seconds运行在完整数据上（不是这个，我创建了这个作为玩具示例），而 Henry Ecker 的实现在0:00:00.568888 seconds运行，因此慢了 10 倍以上。

这可能是因为他正在迭代代码，而我正在迭代日期。 在任何给定时间点，我最多有 30 个日期，但可能有超过 500 个代码

Pandas 索引系列子集（数据帧的）不改变值

问题描述

2 个解决方案

解决方案1
1 2021-05-07 14:26:58

解决方案2
-1 已采纳 2021-05-07 16:23:38

Pandas 索引系列子集（数据帧的）不改变值

问题描述

2 个解决方案

解决方案1 1 2021-05-07 14:26:58

解决方案2 -1 已采纳 2021-05-07 16:23:38

解决方案1
1 2021-05-07 14:26:58

解决方案2
-1 已采纳 2021-05-07 16:23:38