带有附加列的Pandas Pivot

Question

I have a simple question about how to pivot a Pandas Dataframe with the extra problem of having an additional column. 我有一个简单的问题，关于如何通过一个额外的列来解决Pandas Dataframe的问题。

The dataset looks like this one: 数据集如下所示：

X = pd.DataFrame({'country':['Peru','Peru','Japan','Japan'],'method':['m1','m2','m1','m2'], 'value':[1,2,3,4]})

Country   |   Method    |   Value
Peru      |   m1        |   1
Peru      |   m2        |   2
Japan     |   m1        |   3
Japan     |   m2        |   4

All the "Countries" have values for all the "Methods" I would like to pivot this dataframe with each Country as a column but I need to carry on the method: 所有“国家”都具有所有“方法”的值，我想将每个国家/地区作为一列来旋转此数据框，但我需要继续执行该方法：

Peru |  Japan | Method
1    |  3     | m1
2    |  4     | m4

Thanks for the help! 谢谢您的帮助！

Answer 1

You will need to apply .pivot to X follow by .reset_index 您需要将.pivot应用于X ， .reset_index

I have also remove the name of the columns for cleaner output. 我还删除了用于更清晰输出的列的名称。

df = X.pivot(index='method',columns='country',values='value').reset_index() 
df.columns.name = ''
print(df)

Output: 输出：

  method  Japan  Peru
0     m1      3     1
1     m2      4     2

Answer 2

Solution with set_index and unstack : 解决方案与set_index和unstack ：

print (X.set_index(['method','country'])['value']
        .unstack(fill_value=0)
        .rename_axis(None, axis=1)
        .reset_index())

  method  Japan  Peru
0     m1      3     1
1     m2      4     2

but if get error (because duplicates in pair method , country columns): 但是如果出现错误（因为在pair method ， country列中重复）：

ValueError: Index contains duplicate entries, cannot reshape ValueError：索引包含重复的条目，无法重塑

solution with groupby and some aggregate function like mean ( sum , ...) groupby和一些聚合函数（例如mean （ sum ，...））的解决方案

X = pd.DataFrame({'country':['Peru','Peru','Peru','Japan'],
                  'method':['m1','m2','m1','m2'], 
                  'value':[1,2,3,4]})
print (X)
  country method  value
0    Peru     m1      1
1    Peru     m2      2
2    Peru     m1      3 <-duplicates Peru, m1
3   Japan     m2      4

print (X.groupby(['method','country'])['value'].mean()
        .unstack(fill_value=0)
        .rename_axis(None, axis=1)
        .reset_index())

  method  Japan  Peru
0     m1      0     2
1     m2      4     2

Or pivot_table with default aggfunc=np.mean : 或pivot_table默认aggfunc=np.mean ：

print (X.pivot_table(index='method', 
                     columns='country', 
                     values='value', 
                     fill_value=0, 
                     aggfunc=np.mean).
                     rename_axis(None, axis=1).
                     reset_index())

  method  Japan  Peru
0     m1      0     2
1     m2      4     2

带有附加列的Pandas Pivot

问题描述

2 个解决方案

解决方案1
1 已采纳 2017-02-21 03:18:31

解决方案2
0 2017-02-21 06:21:42

带有附加列的Pandas Pivot

问题描述

2 个解决方案

解决方案1 1 已采纳 2017-02-21 03:18:31

解决方案2 0 2017-02-21 06:21:42

解决方案1
1 已采纳 2017-02-21 03:18:31

解决方案2
0 2017-02-21 06:21:42