简体   繁体   English

如何计算 pandas 中行之间的条件百分比变化?

[英]How to calculate conditional percent change between rows in pandas?

Here's my dataframe:这是我的 dataframe:

df = pd.DataFrame({'Period': ['1_Baseline', '1_Baseline', '1_Baseline', '2_Acute', '2_Acute', '2_Acute', '3_Chronic', '3_Chronic', '3_Chronic', '4_Discontinuation', '4_Discontinuation', '4_Discontinuation'],
               'Subject': [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3],
               'Amount': [24, 52, 34, 95, 98, 54, 32, 20, 16, 52, 34, 95]})

I want to create a column that contains a percent change in Amount within each Subject, for each Period, relative to Baseline.我想创建一个列,其中包含每个主题中每个时期的金额相对于基线的百分比变化。 So, for Baseline, it would show how much the Amount changes for subject 1 from Baseline to Acute, and from 1_Baseline to 3_Chronic, and from 1_Baseline to 4_Discontinuation.因此,对于基线,它将显示受试者 1 从基线到急性、从 1_Baseline 到 3_Chronic、从 1_Baseline 到 4_Discontinuation 的数量变化。 It would do the same thing for each subject.它会对每个主题做同样的事情。

Here's what I tried:这是我尝试过的:

df['pct_change'] = df.groupby(['Period'])['Amount'].pct_change()

But I get:但我得到:

               Period  Subject  Amount  pct_change
0          1_Baseline        1      24         NaN
1          1_Baseline        2      52    1.166667
2          1_Baseline        3      34   -0.346154
3             2_Acute        1      95    1.794118
4             2_Acute        2      98    0.031579
5             2_Acute        3      54   -0.448980
6           3_Chronic        1      32   -0.407407
7           3_Chronic        2      20   -0.375000
8           3_Chronic        3      16   -0.200000
9   4_Discontinuation        1      52    2.250000
10  4_Discontinuation        2      34   -0.346154
11  4_Discontinuation        3      95    1.794118

The results are not calculated within each Period, and are not relative to each Subject's previous Amount.结果不是在每个周期内计算的,也与每个受试者之前的金额无关。

Expect Output:期望 Output:

               Period  Subject  Amount  pct_change
0          1_Baseline        1      24         NaN
1          1_Baseline        2      52         NaN
2          1_Baseline        3      34         NaN
3             2_Acute        1      95         2.958333333
4             2_Acute        2      98         0.884615385
5             2_Acute        3      54         0.588235294
6           3_Chronic        1      32         0.333333333
7           3_Chronic        2      20        -0.615384615
8           3_Chronic        3      16        -0.529411765
9   4_Discontinuation        1      52         1.166666667
10  4_Discontinuation        2      34        -0.346153846
11  4_Discontinuation        3      95         1.794117647

IIUC, you want to divide Amount at every row with Subject==2 to Amount at Period==1_Baseline and Subject==2 . IIUC,您想将每行的AmountSubject==2划分为Amount at Period==1_BaselineSubject==2 Here's my approach:这是我的方法:

s = df.set_index(['Subject', 'Period']).Amount.unstack('Period')
df['pct_change'] = (s.div(s['1_Baseline'], axis='rows').sub(1)
                    .unstack().values
                   )

Output: Output:

               Period  Subject  Amount  pct_change
0          1_Baseline        1      24    0.000000
1          1_Baseline        2      52    0.000000
2          1_Baseline        3      34    0.000000
3             2_Acute        1      95    2.958333
4             2_Acute        2      98    0.884615
5             2_Acute        3      54    0.588235
6           3_Chronic        1      32    0.333333
7           3_Chronic        2      20   -0.615385
8           3_Chronic        3      16   -0.529412
9   4_Discontinuation        1      52    1.166667
10  4_Discontinuation        2      34   -0.346154
11  4_Discontinuation        3      95    1.794118

Note that the order of the rows is very important.请注意,行的顺序非常重要。 In this case, you do have the correct row order for this to work.在这种情况下,您确实具有正确的行顺序以使其正常工作。 If you are not certain about the order, then it's safer to merge:如果您不确定顺序,那么合并更安全:

s = df.set_index(['Subject', 'Period']).Amount.unstack('Period')
s = s.div(s['1_Baseline'], axis='rows').sub(1).unstack().reset_index(name='pct_change')

df.merge(s, on=['Period','Subject'], how='left')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM