[英]how to groupby and aggregate dynamic columns in pandas
I have following dataframe in pandas我在 pandas 中有关注 dataframe
code tank nozzle_1 nozzle_2 nozzle_var nozzle_sale
123 1 1 1 10 10
123 1 2 2 12 10
123 2 1 1 10 10
123 2 2 2 12 10
123 1 1 1 10 10
123 2 2 2 12 10
Now, I want to generate cumulative sum of all the columns grouping over tank and take out the last observation.现在,我想生成在 tank 上分组的所有列的累积总和,并取出最后一次观察。 nozzle_1 and nozzle_2 columns are dynamic, it could be nozzle_3, nozzle_4....nozzle_n etc. I am doing following in pandas to get the cumsum
喷嘴_1 和喷嘴_2 列是动态的,它可能是喷嘴_3,喷嘴_4 ....喷嘴_n 等。我在pandas 中进行以下操作以获取cumsum
## Below code for calculating cumsum of dynamic columns nozzle_1 and nozzle_2
cols= df.columns[df.columns.str.contains(pat='nozzle_\d+$', regex=True)]
df.assign(**df.groupby('tank')[cols].agg(['cumsum'])\
.pipe(lambda x: x.set_axis(x.columns.map('_'.join), axis=1, inplace=False)))
## nozzle_sale_cumsum is static column
df[nozzle_sale_cumsum] = df.groupby('tank')['nozzle_sale'].cumsum()
From above code I will get cumsum of following columns从上面的代码中,我将获得以下列的 cumsum
tank nozzle_1 nozzle_2 nozzle_var nozzle_1_cumsum nozzle_2_cumsum nozzle_sale_cumsum
1 1 1 10 1 1 10
1 2 2 12 3 3 20
2 1 1 10 1 1 10
2 2 2 12 3 3 20
1 1 1 10 4 4 30
2 2 2 12 5 5 30
Now, I want to get last values of all 3 cumsum columns grouping over tank.现在,我想获取所有 3 个 cumsum 列的最后一个值,这些列分组在 tank 上。 I can do it with following code in pandas, but it is hard coded with column names.
我可以使用 pandas 中的以下代码来完成,但它是用列名硬编码的。
final_df= df.groupby('tank').agg({'nozzle_1_cumsum':'last',
'nozzle_2_cumsum':'last',
'nozzle_sale_cumsum':'last',
}).reset_index()
Problem with above code is nozzle_1_cumsum and nozzle_2_cumsum is hard coded which is not the case.上面代码的问题是喷嘴_1_cumsum,而喷嘴_2_cumsum 是硬编码的,事实并非如此。 How can I do this in pandas with dynamic columns.
如何使用动态列在 pandas 中执行此操作。
How about:怎么样:
df.filter(regex='_cumsum').groupby(df['tank']).last()
Output: Output:
nozzle_1_cumsum nozzle_2_cumsum nozzle_sale_cumsum
tank
1 4 4 30
2 5 5 30
You can also replace df.filter(...)
by, eg, df.iloc[:,-3:]
or df[col_names]
.您还可以将
df.filter(...)
替换为df.iloc[:,-3:]
或df[col_names]
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.