Pivot 在多个表上 pandas

Question

I want to create a pivot with average values over multiple tables.我想创建一个 pivot，它具有多个表的平均值。 Here is an example that I want to create: Inputs are df1 and df2 , res is the result I want to calculate from df1 and df2这是我要创建的示例：输入是df1和df2 ， res是我要从df1和df2计算的结果

import pandas as pd
import numpy as np

df1 = pd.DataFrame({"2000": ["A", "A", "B"],
                   "2001": ["A", "B", "B"],
                   "2002": ["B", "B", "B"]},
                   index =['Item1', 'Item2', 'Item3'])

df2 = pd.DataFrame({"2000": [0.5, 0.7, 0.1],
                   "2001": [0.6, 0.6, 0.3],
                   "2002": [0.7, 0.4, 0.2]},
                   index =['Item1', 'Item2', 'Item3'])

display(df1)
display(df2)

res = pd.DataFrame({"2000": [0.6, 0.1],
                   "2001": [0.6, 0.45], 
                   "2002": [np.nan, 0.43]},
                   index =['A', 'B'])

display(res)

Both dataframes have years in columns.两个数据框都有年份列。 Each row is an item.每行是一个项目。 The items change state over time.这些项目随时间变化 state。 The state is defined in df1 . state 在df1中定义。 They also have values each year, defined in df2 .它们每年也有值，在df2中定义。 I want to calculate the average value by year for each group of states A , B .我想按年计算每组状态A 、 B的平均值。

I did not achieve to calculate res , any suggestions?我没有实现计算res ，有什么建议吗？

Answer 1

To solve this problem you should merge both DataFrames in one, at first.要解决此问题，您应该首先将两个 DataFrame 合并为一个。 For example you can use this code convert dataframe from wide to long and then merge both of them by the index (year, item), and finally reset the index to be used as a column in the pivot:例如，您可以使用此代码将 dataframe 从宽转换为长，然后通过索引（年份、项目）合并它们，最后重置索引以用作 pivot 中的列：

df_full = pd.concat([df1.unstack(), df2.unstack()], axis=1).reset_index()

Then, if you want, you can rename columns to build a clear pivot:然后，如果需要，您可以重命名列以构建清晰的 pivot：


df_full = df_full.rename(columns={'level_0': 'year', 'level_1': 'item', 0: 'DF1', 1:'DF2'})

And finally build a pivot table.最后建一个pivot表。

res_out = pd.pivot_table(data=df_full, index='DF1', columns='year', values='DF2', aggfunc='mean')

It's not a one line solution, but it works.这不是一种单线解决方案，但它确实有效。

df_full = pd.concat([df1.unstack(), df2.unstack()], axis=1).reset_index()
df_full = df_full.rename(columns={'level_0': 'year', 'level_1': 'item', 0: 'DF1', 1:'DF2'})
res_out = pd.pivot_table(data=df_full, index='DF1', columns='year', values='DF2', aggfunc='mean')
display(res_out)

Answer 2

This code using stack, join and unstack should work:使用 stack、join 和 unstack 的这段代码应该可以工作：

df1_long = df1.stack().to_frame().rename({0:'category'}, axis=1)
df2_long = df2.stack().to_frame().rename({0:'values'}, axis=1)
joined_data = df1_long.join(df2_long).reset_index().rename({'level_0':'item','level_1':'year'}, axis=1)
res = joined_data.groupby(['category', 'year']).mean().unstack()

display(res)

Pivot 在多个表上 pandas

问题描述

2 个解决方案

解决方案1
0 已采纳 2022-03-05 17:52:57

解决方案2
0 2022-03-05 17:59:17

Pivot 在多个表上 pandas

问题描述

2 个解决方案

解决方案1 0 已采纳 2022-03-05 17:52:57

解决方案2 0 2022-03-05 17:59:17

解决方案1
0 已采纳 2022-03-05 17:52:57

解决方案2
0 2022-03-05 17:59:17