[英]How to calculate conditional percent change between rows in pandas?
Here's my dataframe:这是我的 dataframe:
df = pd.DataFrame({'Period': ['1_Baseline', '1_Baseline', '1_Baseline', '2_Acute', '2_Acute', '2_Acute', '3_Chronic', '3_Chronic', '3_Chronic', '4_Discontinuation', '4_Discontinuation', '4_Discontinuation'],
'Subject': [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3],
'Amount': [24, 52, 34, 95, 98, 54, 32, 20, 16, 52, 34, 95]})
I want to create a column that contains a percent change in Amount within each Subject, for each Period, relative to Baseline.我想创建一个列,其中包含每个主题中每个时期的金额相对于基线的百分比变化。 So, for Baseline, it would show how much the Amount changes for subject 1 from Baseline to Acute, and from 1_Baseline to 3_Chronic, and from 1_Baseline to 4_Discontinuation.
因此,对于基线,它将显示受试者 1 从基线到急性、从 1_Baseline 到 3_Chronic、从 1_Baseline 到 4_Discontinuation 的数量变化。 It would do the same thing for each subject.
它会对每个主题做同样的事情。
Here's what I tried:这是我尝试过的:
df['pct_change'] = df.groupby(['Period'])['Amount'].pct_change()
But I get:但我得到:
Period Subject Amount pct_change
0 1_Baseline 1 24 NaN
1 1_Baseline 2 52 1.166667
2 1_Baseline 3 34 -0.346154
3 2_Acute 1 95 1.794118
4 2_Acute 2 98 0.031579
5 2_Acute 3 54 -0.448980
6 3_Chronic 1 32 -0.407407
7 3_Chronic 2 20 -0.375000
8 3_Chronic 3 16 -0.200000
9 4_Discontinuation 1 52 2.250000
10 4_Discontinuation 2 34 -0.346154
11 4_Discontinuation 3 95 1.794118
The results are not calculated within each Period, and are not relative to each Subject's previous Amount.结果不是在每个周期内计算的,也与每个受试者之前的金额无关。
Expect Output:期望 Output:
Period Subject Amount pct_change
0 1_Baseline 1 24 NaN
1 1_Baseline 2 52 NaN
2 1_Baseline 3 34 NaN
3 2_Acute 1 95 2.958333333
4 2_Acute 2 98 0.884615385
5 2_Acute 3 54 0.588235294
6 3_Chronic 1 32 0.333333333
7 3_Chronic 2 20 -0.615384615
8 3_Chronic 3 16 -0.529411765
9 4_Discontinuation 1 52 1.166666667
10 4_Discontinuation 2 34 -0.346153846
11 4_Discontinuation 3 95 1.794117647
IIUC, you want to divide Amount
at every row with Subject==2
to Amount
at Period==1_Baseline
and Subject==2
. IIUC,您想将每行的
Amount
与Subject==2
划分为Amount
at Period==1_Baseline
和Subject==2
。 Here's my approach:这是我的方法:
s = df.set_index(['Subject', 'Period']).Amount.unstack('Period')
df['pct_change'] = (s.div(s['1_Baseline'], axis='rows').sub(1)
.unstack().values
)
Output: Output:
Period Subject Amount pct_change
0 1_Baseline 1 24 0.000000
1 1_Baseline 2 52 0.000000
2 1_Baseline 3 34 0.000000
3 2_Acute 1 95 2.958333
4 2_Acute 2 98 0.884615
5 2_Acute 3 54 0.588235
6 3_Chronic 1 32 0.333333
7 3_Chronic 2 20 -0.615385
8 3_Chronic 3 16 -0.529412
9 4_Discontinuation 1 52 1.166667
10 4_Discontinuation 2 34 -0.346154
11 4_Discontinuation 3 95 1.794118
Note that the order of the rows is very important.请注意,行的顺序非常重要。 In this case, you do have the correct row order for this to work.
在这种情况下,您确实具有正确的行顺序以使其正常工作。 If you are not certain about the order, then it's safer to merge:
如果您不确定顺序,那么合并更安全:
s = df.set_index(['Subject', 'Period']).Amount.unstack('Period')
s = s.div(s['1_Baseline'], axis='rows').sub(1).unstack().reset_index(name='pct_change')
df.merge(s, on=['Period','Subject'], how='left')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.