[英]Keep pandas dataframe columns and their order in pivot table
I have a dataframe: 我有一个数据帧:
df = pd.DataFrame({'No': [123,123,123,523,523,523,765],
'Type': ['A','B','C','A','C','D','A'],
'Task': ['First','Second','First','Second','Third','First','Fifth'],
'Color': ['blue','red','blue','black','red','red','red'],
'Price': [10,5,1,12,12,12,18],
'Unit': ['E','E','E','E','E','E','E'],
'Pers.ID': [45,6,6,43,1,9,2]
})
So it looks like this: 所以它看起来像这样:
df
+-----+------+--------+-------+-------+------+---------+
| No | Type | Task | Color | Price | Unit | Pers.ID |
+-----+------+--------+-------+-------+------+---------+
| 123 | A | First | blue | 10 | E | 45 |
| 123 | B | Second | red | 5 | E | 6 |
| 123 | C | First | blue | 1 | E | 6 |
| 523 | A | Second | black | 12 | E | 43 |
| 523 | C | Third | red | 12 | E | 1 |
| 523 | D | First | red | 12 | E | 9 |
| 765 | A | First | red | 18 | E | 2 |
+-----+------+--------+-------+-------+------+---------+
then I created a pivot table: 然后我创建了一个数据透视表:
piv = pd.pivot_table(df, index=['No','Type','Task'])
Result: 结果:
Pers.ID Price
No Type Task
123 A First 45 10
B Second 6 5
C First 6 1
523 A Second 43 12
C Third 1 12
D First 9 12
765 A Fifth 2 18
As you can see, problems are: 如您所见,问题是:
multiple columns are gone (Color and Unit) 多列已消失(颜色和单位)
The order of the columns Price and Pers.ID is not the same as in the original dataframe. 列Price和Pers.ID的顺序与原始数据框中的顺序不同。
I tried to fix this by executing: 我尝试通过执行以下方法解决此问题
cols = list(df.columns)
piv = pd.pivot_table(df, index=['No','Type','Task'], values = cols)
but the result is the same. 但结果是一样的。
I read other posts but none of them matched my problem in a way that I could use it. 我读了其他帖子,但没有一个能以我可以使用它的方式匹配我的问题。
Thank you! 谢谢!
EDIT : desired output 编辑 :所需的输出
Color Price Unit Pers.ID
No Type Task
123 A First blue 10 E 45
B Second red 5 E 6
C First blue 1 E 6
523 A Second black 12 E 43
C Third red 12 E 1
D First red 12 E 9
765 A Fifth red 18 E 2
I think problem is in pivot_table
default aggregate function is mean
, so strings columns are excluded . 我认为问题在于pivot_table
默认的聚合函数是mean
,因此排除了字符串列 。 So need custom function, also order is changed, so reindex
is necessary: 所以需要自定义功能,也要改变订单,所以reindex
是必要的:
f = lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else ', '.join(x)
cols = df.columns[~df.columns.isin(['No','Type','Task'])].tolist()
piv = (pd.pivot_table(df,
index=['No','Type','Task'],
values = cols,
aggfunc=f).reindex(columns=cols))
print (piv)
Color Price Unit Pers.ID
No Type Task
123 A First blue 10 E 45
B Second red 5 E 6
C First blue 1 E 6
523 A Second black 12 E 43
C Third red 12 E 1
D First red 12 E 9
765 A Fifth red 18 E 2
Another solution with groupby
and same aggregation function, ordering is not problem: 另一个具有groupby
和相同聚合功能的解决方案,排序不是问题:
df = (df.groupby(['No','Type','Task'])
.agg(lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else ', '.join(x)))
print (df)
Color Price Unit Pers.ID
No Type Task
123 A First blue 10 E 45
B Second red 5 E 6
C First blue 1 E 6
523 A Second black 12 E 43
C Third red 12 E 1
D First red 12 E 9
765 A Fifth red 18 E 2
But if need set first 3 columns to MultiIndex
only: 但是如果只需MultiIndex
前3列设置为MultiIndex
:
df = df.set_index(['No','Type','Task'])
print (df)
Color Price Unit Pers.ID
No Type Task
123 A First blue 10 E 45
B Second red 5 E 6
C First blue 1 E 6
523 A Second black 12 E 43
C Third red 12 E 1
D First red 12 E 9
765 A Fifth red 18 E 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.