[英]In Pandas, how to do some arithmetic calculations for specific consecutive columns
In the following code, I like to calculate the total percentage change for Value only when Code is 'b'.在下面的代码中,我想仅在 Code 为“b”时计算 Value 的总百分比变化。 The expected answer is 0.6 (which is 3/4 * 8/10).预期的答案是 0.6(即 3/4 * 8/10)。
import pandas as pd
import numpy as np
x = pd.DataFrame({'Code':['a', 'a', 'a', 'b', 'b', 'a', 'a', 'a', 'b', 'b', 'b', 'a', 'a'], 'Value': np.arange(13)})
Code Value
0 a 0
1 a 1
2 a 2
3 b 3
4 b 4
5 a 5
6 a 6
7 a 7
8 b 8
9 b 9
10 b 10
11 a 11
12 a 12
I tried with df.groupby
, but as there are two groups of 'b', it does not do what I expected.我尝试使用df.groupby
,但由于有两组“b”,它没有达到我的预期。
Thank you very much for your time in advance.非常感谢您提前抽出宝贵时间。
What you're trying to calculate requires you to group您要计算的内容需要您分组
Notice that grouping consecutive rows is grouping data based on a property of the index .请注意,对连续行进行分组是根据索引的属性对数据进行分组。 A common and very flexible trick you can do in cases like this is to introduce a new column that stores the property of the index you care about.在这种情况下,您可以做的一个常见且非常灵活的技巧是引入一个新列来存储您关心的索引的属性。
In this case, you can track in a column how many times the value in the Code
column has changed between consecutive rows:在这种情况下,您可以在一列中跟踪Code
列中的值在连续行之间更改了多少次:
(x.assign(code_changed=lambda df: df.Code != df.Code.shift(),
ordered_code=lambda df: df.code_changed.cumsum())
Code Value code_changed ordered_code
0 a 0 True 1
1 a 1 False 1
2 a 2 False 1
3 b 3 True 2
4 b 4 False 2
5 a 5 True 3
6 a 6 False 3
7 a 7 False 3
8 b 8 True 4
9 b 9 False 4
10 b 10 False 4
11 a 11 True 5
12 a 12 False 5
The ordered_code
column contains the exact grouping information you're looking for. ordered_code
列包含您要查找的确切分组信息。 You can then reach the output you're hoping for by restricting to rows with Code
equal to 'b'
and aggregating Value
s:然后,您可以通过限制Code
等于'b'
行并聚合Value
s 来达到您希望的 output :
(x.assign(code_changed=lambda df: df.Code != df.Code.shift(),
ordered_code=lambda df: df.code_changed.cumsum())
.pipe(lambda df: df[df.Code == 'b'])
.groupby('ordered_code')
.Value
.agg(lambda values: values.iloc[0] / values.iloc[-1])
.prod())
This outputs这输出
0.6000000000000001
as desired.如预期的。
Then take whatever your desired values are from pct_change and multiply them together as you wish.然后从 pct_change 获取您想要的任何值,并根据需要将它们相乘。
pct_change = df.loc[df['Code'] == 'b'].pct_change()
Multiply the first and third value.将第一个值和第三个值相乘。
pct_change.iloc[[1]].values * pct_change.iloc[[3]].values
Or if you have multiple values you can write a loop to get different rows of pct_change.或者,如果您有多个值,您可以编写一个循环来获取 pct_change 的不同行。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.