简体   繁体   English

Pivot 在多个表上 pandas

[英]Pivot over multiple tables with pandas

I want to create a pivot with average values over multiple tables.我想创建一个 pivot,它具有多个表的平均值。 Here is an example that I want to create: Inputs are df1 and df2 , res is the result I want to calculate from df1 and df2这是我要创建的示例:输入是df1df2res是我要从df1df2计算的结果

import pandas as pd
import numpy as np

df1 = pd.DataFrame({"2000": ["A", "A", "B"],
                   "2001": ["A", "B", "B"],
                   "2002": ["B", "B", "B"]},
                   index =['Item1', 'Item2', 'Item3'])

df2 = pd.DataFrame({"2000": [0.5, 0.7, 0.1],
                   "2001": [0.6, 0.6, 0.3],
                   "2002": [0.7, 0.4, 0.2]},
                   index =['Item1', 'Item2', 'Item3'])

display(df1)
display(df2)

res = pd.DataFrame({"2000": [0.6, 0.1],
                   "2001": [0.6, 0.45], 
                   "2002": [np.nan, 0.43]},
                   index =['A', 'B'])

display(res)

Both dataframes have years in columns.两个数据框都有年份列。 Each row is an item.每行是一个项目。 The items change state over time.这些项目随时间变化 state。 The state is defined in df1 . state 在df1中定义。 They also have values each year, defined in df2 .它们每年也有值,在df2中定义。 I want to calculate the average value by year for each group of states A , B .我想按年计算每组状态AB的平均值。

I did not achieve to calculate res , any suggestions?我没有实现计算res ,有什么建议吗?

To solve this problem you should merge both DataFrames in one, at first.要解决此问题,您应该首先将两个 DataFrame 合并为一个。 For example you can use this code convert dataframe from wide to long and then merge both of them by the index (year, item), and finally reset the index to be used as a column in the pivot:例如,您可以使用此代码将 dataframe 从宽转换为长,然后通过索引(年份、项目)合并它们,最后重置索引以用作 pivot 中的列:

df_full = pd.concat([df1.unstack(), df2.unstack()], axis=1).reset_index()

长数据框

Then, if you want, you can rename columns to build a clear pivot:然后,如果需要,您可以重命名列以构建清晰的 pivot:


df_full = df_full.rename(columns={'level_0': 'year', 'level_1': 'item', 0: 'DF1', 1:'DF2'})

列的新名称

And finally build a pivot table.最后建一个pivot表。

res_out = pd.pivot_table(data=df_full, index='DF1', columns='year', values='DF2', aggfunc='mean')

在此处输入图像描述

It's not a one line solution, but it works.这不是一种单线解决方案,但它确实有效。

df_full = pd.concat([df1.unstack(), df2.unstack()], axis=1).reset_index()
df_full = df_full.rename(columns={'level_0': 'year', 'level_1': 'item', 0: 'DF1', 1:'DF2'})
res_out = pd.pivot_table(data=df_full, index='DF1', columns='year', values='DF2', aggfunc='mean')
display(res_out)

This code using stack, join and unstack should work:使用 stack、join 和 unstack 的这段代码应该可以工作:

df1_long = df1.stack().to_frame().rename({0:'category'}, axis=1)
df2_long = df2.stack().to_frame().rename({0:'values'}, axis=1)
joined_data = df1_long.join(df2_long).reset_index().rename({'level_0':'item','level_1':'year'}, axis=1)
res = joined_data.groupby(['category', 'year']).mean().unstack()

display(res)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM