简体   繁体   English

带有附加列的Pandas Pivot

[英]Pandas Pivot with extra column

I have a simple question about how to pivot a Pandas Dataframe with the extra problem of having an additional column. 我有一个简单的问题,关于如何通过一个额外的列来解决Pandas Dataframe的问题。

The dataset looks like this one: 数据集如下所示:

X = pd.DataFrame({'country':['Peru','Peru','Japan','Japan'],'method':['m1','m2','m1','m2'], 'value':[1,2,3,4]})

Country   |   Method    |   Value
Peru      |   m1        |   1
Peru      |   m2        |   2
Japan     |   m1        |   3
Japan     |   m2        |   4

All the "Countries" have values for all the "Methods" I would like to pivot this dataframe with each Country as a column but I need to carry on the method: 所有“国家”都具有所有“方法”的值,我想将每个国家/地区作为一列来旋转此数据框,但我需要继续执行该方法:

Peru |  Japan | Method
1    |  3     | m1
2    |  4     | m4

Thanks for the help! 谢谢您的帮助!

You will need to apply .pivot to X follow by .reset_index 您需要将.pivot应用于X.reset_index

I have also remove the name of the columns for cleaner output. 我还删除了用于更清晰输出的列的名称。

df = X.pivot(index='method',columns='country',values='value').reset_index() 
df.columns.name = ''
print(df)

Output: 输出:

  method  Japan  Peru
0     m1      3     1
1     m2      4     2

Solution with set_index and unstack : 解决方案与set_indexunstack

print (X.set_index(['method','country'])['value']
        .unstack(fill_value=0)
        .rename_axis(None, axis=1)
        .reset_index())

  method  Japan  Peru
0     m1      3     1
1     m2      4     2

but if get error (because duplicates in pair method , country columns): 但是如果出现错误(因为在pair methodcountry列中重复):

ValueError: Index contains duplicate entries, cannot reshape ValueError:索引包含重复的条目,无法重塑

solution with groupby and some aggregate function like mean ( sum , ...) groupby和一些聚合函数(例如meansum ,...))的解决方案

X = pd.DataFrame({'country':['Peru','Peru','Peru','Japan'],
                  'method':['m1','m2','m1','m2'], 
                  'value':[1,2,3,4]})
print (X)
  country method  value
0    Peru     m1      1
1    Peru     m2      2
2    Peru     m1      3 <-duplicates Peru, m1
3   Japan     m2      4

print (X.groupby(['method','country'])['value'].mean()
        .unstack(fill_value=0)
        .rename_axis(None, axis=1)
        .reset_index())

  method  Japan  Peru
0     m1      0     2
1     m2      4     2

Or pivot_table with default aggfunc=np.mean : pivot_table默认aggfunc=np.mean

print (X.pivot_table(index='method', 
                     columns='country', 
                     values='value', 
                     fill_value=0, 
                     aggfunc=np.mean).
                     rename_axis(None, axis=1).
                     reset_index())

  method  Japan  Peru
0     m1      0     2
1     m2      4     2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM