[英]Pandas Pivot with extra column
I have a simple question about how to pivot a Pandas Dataframe with the extra problem of having an additional column. 我有一个简单的问题,关于如何通过一个额外的列来解决Pandas Dataframe的问题。
The dataset looks like this one: 数据集如下所示:
X = pd.DataFrame({'country':['Peru','Peru','Japan','Japan'],'method':['m1','m2','m1','m2'], 'value':[1,2,3,4]})
Country | Method | Value
Peru | m1 | 1
Peru | m2 | 2
Japan | m1 | 3
Japan | m2 | 4
All the "Countries" have values for all the "Methods" I would like to pivot this dataframe with each Country as a column but I need to carry on the method: 所有“国家”都具有所有“方法”的值,我想将每个国家/地区作为一列来旋转此数据框,但我需要继续执行该方法:
Peru | Japan | Method
1 | 3 | m1
2 | 4 | m4
Thanks for the help! 谢谢您的帮助!
You will need to apply .pivot
to X
follow by .reset_index
您需要将
.pivot
应用于X
, .reset_index
I have also remove the name of the columns for cleaner output. 我还删除了用于更清晰输出的列的名称。
df = X.pivot(index='method',columns='country',values='value').reset_index()
df.columns.name = ''
print(df)
Output: 输出:
method Japan Peru
0 m1 3 1
1 m2 4 2
Solution with set_index
and unstack
: 解决方案与
set_index
和unstack
:
print (X.set_index(['method','country'])['value']
.unstack(fill_value=0)
.rename_axis(None, axis=1)
.reset_index())
method Japan Peru
0 m1 3 1
1 m2 4 2
but if get error (because duplicates in pair method
, country
columns): 但是如果出现错误(因为在pair
method
, country
列中重复):
ValueError: Index contains duplicate entries, cannot reshape
ValueError:索引包含重复的条目,无法重塑
solution with groupby
and some aggregate function like mean
( sum
, ...) groupby
和一些聚合函数(例如mean
( sum
,...))的解决方案
X = pd.DataFrame({'country':['Peru','Peru','Peru','Japan'],
'method':['m1','m2','m1','m2'],
'value':[1,2,3,4]})
print (X)
country method value
0 Peru m1 1
1 Peru m2 2
2 Peru m1 3 <-duplicates Peru, m1
3 Japan m2 4
print (X.groupby(['method','country'])['value'].mean()
.unstack(fill_value=0)
.rename_axis(None, axis=1)
.reset_index())
method Japan Peru
0 m1 0 2
1 m2 4 2
Or pivot_table
with default aggfunc=np.mean
: 或
pivot_table
默认aggfunc=np.mean
:
print (X.pivot_table(index='method',
columns='country',
values='value',
fill_value=0,
aggfunc=np.mean).
rename_axis(None, axis=1).
reset_index())
method Japan Peru
0 m1 0 2
1 m2 4 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.