簡體   English   中英

如何使用groupby減去列中的值

[英]How to subtract values in a column using groupby

我有以下數據框:

ID  Days TreatmentGiven TreatmentNumber
--- ---- -------------- ---------------
1    0      False             NaN
1    30     False             NaN
1    40     True               1
1    56     False             NaN 
2    0      False             NaN
2    14     True               1
2    28     True               2 

我想根據第一次治療的時間 (TreatmentNumber==1) 創建一個新的列,其中包含天數的新基線,按 ID 分組,結果如下:

ID  Days TreatmentGiven TreatmentNumber New_Baseline
--- ---- -------------- --------------- ------------
1    0      False             NaN          -40
1    30     False             NaN          -10
1    40     True               1            0
1    56     False             NaN           16
2    0      False             NaN          -14
2    14     True               1            0
2    28     True               2            14

做這個的最好方式是什么?

謝謝你。

想法是在TreatmentNumber使用1過濾行,然后通過IDSeries.map轉換為Series ,用於減去帶有Series.subDays列:

s = df[df['TreatmentNumber'].eq(1)].set_index('ID')['Days']
#Series created by first True rows by TreatmentGiven per groups
#s = df[df['TreatmentGiven']].drop_duplicates('ID').set_index('ID')['Days']
df['New_Baseline'] = df['Days'].sub(df['ID'].map(s))
print (df)
   ID  Days  TreatmentGiven  TreatmentNumber  New_Baseline
0   1     0           False              NaN           -40
1   1    30           False              NaN           -10
2   1    40            True              1.0             0
3   1    56           False              NaN            16
4   2     0           False              NaN           -14
5   2    14            True              1.0             0
6   2    28            True              2.0            14

詳情

print (s)
ID
1    40
2    14
Name: Days, dtype: int64

print (df['ID'].map(s))
0    40
1    40
2    40
3    40
4    14
5    14
6    14
Name: ID, dtype: int64

這是series.where + groupby+transform一種方法:

s = df['Days'].where(df['TreatmentGiven']).groupby(df['ID']).transform('first')
df['New_Baseline'] = df['Days'].sub(s)

輸出

   ID  Days  TreatmentGiven  TreatmentNumber  New_Baseline
0   1     0           False              NaN         -40.0
1   1    30           False              NaN         -10.0
2   1    40            True              1.0           0.0
3   1    56           False              NaN          16.0
4   2     0           False              NaN         -14.0
5   2    14            True              1.0           0.0
6   2    28            True              2.0          14.0

這是另一種方法:

aux = df[df['TreatmentGiven']==True].groupby('ID')['Days'].first().reset_index()

df = df.merge(aux,how='left',on='ID').rename(columns={'Days_x':'Days','Days_y':'New_baseline'})
df['New_baseline'] = df['Days'] - df['New_baseline']

輸出:

     ID Days    TreatmentGiven  TreatMentNumber New_baseline
 0    1    0             False              NaN          -40
 1    1   30             False              NaN          -10
 2    1   40              True              1.0            0
 3    1   56             False              NaN           16
 4    2    0             False              NaN          -14
 5    2   14              True              1.0            0
 6    2   28              True              2.0           14

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM