[英]Pivot over multiple tables with pandas
I want to create a pivot with average values over multiple tables.我想创建一个 pivot,它具有多个表的平均值。 Here is an example that I want to create: Inputs are
df1
and df2
, res
is the result I want to calculate from df1
and df2
这是我要创建的示例:输入是
df1
和df2
, res
是我要从df1
和df2
计算的结果
import pandas as pd
import numpy as np
df1 = pd.DataFrame({"2000": ["A", "A", "B"],
"2001": ["A", "B", "B"],
"2002": ["B", "B", "B"]},
index =['Item1', 'Item2', 'Item3'])
df2 = pd.DataFrame({"2000": [0.5, 0.7, 0.1],
"2001": [0.6, 0.6, 0.3],
"2002": [0.7, 0.4, 0.2]},
index =['Item1', 'Item2', 'Item3'])
display(df1)
display(df2)
res = pd.DataFrame({"2000": [0.6, 0.1],
"2001": [0.6, 0.45],
"2002": [np.nan, 0.43]},
index =['A', 'B'])
display(res)
Both dataframes have years in columns.两个数据框都有年份列。 Each row is an item.
每行是一个项目。 The items change state over time.
这些项目随时间变化 state。 The state is defined in
df1
. state 在
df1
中定义。 They also have values each year, defined in df2
.它们每年也有值,在
df2
中定义。 I want to calculate the average value by year for each group of states A
, B
.我想按年计算每组状态
A
、 B
的平均值。
I did not achieve to calculate res
, any suggestions?我没有实现计算
res
,有什么建议吗?
To solve this problem you should merge both DataFrames in one, at first.要解决此问题,您应该首先将两个 DataFrame 合并为一个。 For example you can use this code convert dataframe from wide to long and then merge both of them by the index (year, item), and finally reset the index to be used as a column in the pivot:
例如,您可以使用此代码将 dataframe 从宽转换为长,然后通过索引(年份、项目)合并它们,最后重置索引以用作 pivot 中的列:
df_full = pd.concat([df1.unstack(), df2.unstack()], axis=1).reset_index()
Then, if you want, you can rename columns to build a clear pivot:然后,如果需要,您可以重命名列以构建清晰的 pivot:
df_full = df_full.rename(columns={'level_0': 'year', 'level_1': 'item', 0: 'DF1', 1:'DF2'})
And finally build a pivot table.最后建一个pivot表。
res_out = pd.pivot_table(data=df_full, index='DF1', columns='year', values='DF2', aggfunc='mean')
It's not a one line solution, but it works.这不是一种单线解决方案,但它确实有效。
df_full = pd.concat([df1.unstack(), df2.unstack()], axis=1).reset_index()
df_full = df_full.rename(columns={'level_0': 'year', 'level_1': 'item', 0: 'DF1', 1:'DF2'})
res_out = pd.pivot_table(data=df_full, index='DF1', columns='year', values='DF2', aggfunc='mean')
display(res_out)
This code using stack, join and unstack should work:使用 stack、join 和 unstack 的这段代码应该可以工作:
df1_long = df1.stack().to_frame().rename({0:'category'}, axis=1)
df2_long = df2.stack().to_frame().rename({0:'values'}, axis=1)
joined_data = df1_long.join(df2_long).reset_index().rename({'level_0':'item','level_1':'year'}, axis=1)
res = joined_data.groupby(['category', 'year']).mean().unstack()
display(res)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.