[英]Pandas pivot_table : cannot achieve correct format
I have the following data as setup : 我有以下数据作为设置:
import pandas as pd
df = pd.DataFrame({
'index_1' : ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
'index_2' : ['A', 'B', 'C', 'D', 'A', 'B', 'C', 'D'],
'month' : ['jan', 'jan', 'feb', 'feb', 'jan', 'jan', 'feb', 'feb'],
'value_1' : range(0, 8),
'value_2' : range(8, 16)
})
print(df)
# index_1 index_2 month value_1 value_2
# 0 A A jan 0 8
# 1 A B jan 1 9
# 2 A C feb 2 10
# 3 A D feb 3 11
# 4 B A jan 4 12
# 5 B B jan 5 13
# 6 B C feb 6 14
# 7 B D feb 7 15
My expected output would look like this... (I did it by hand) 我的预期输出看起来像这样......(我手工完成)
print(expected_output)
# jan feb
# month value_1 value_2 value_1 value_2
# index_1 index_2
# A A 0.0 8.0 NaN NaN
# B 1.0 9.0 NaN NaN
# C NaN NaN 2.0 10.0
# D NaN NaN 3.0 11.0
# B A 4.0 12.0 NaN NaN
# B 5.0 13.0 NaN NaN
# C NaN NaN 6.0 14.0
# D NaN NaN 7.0 15.0
There must be something I cannot wrap my mind around. 必须有一些我无法理解的东西。 I achieved the following, which is the good data, in a wrong format.
我以错误的格式实现了以下,即良好的数据。
df = pd.pivot_table(
df,
index=['index_1', 'index_2'],
columns=['month'],
# The 2 following lines are implicit, and don't change the output.
# values=['value_1', 'value_2'],
# aggfunc='sum'
)
print(df)
# value_1 value_2
# month feb jan feb jan
# index_1 index_2
# A A NaN 0.0 NaN 8.0
# B NaN 1.0 NaN 9.0
# C 2.0 NaN 10.0 NaN
# D 3.0 NaN 11.0 NaN
# B A NaN 4.0 NaN 12.0
# B NaN 5.0 NaN 13.0
# C 6.0 NaN 14.0 NaN
# D 7.0 NaN 15.0 NaN
I also tried using some .groupby()
, along with .transpose()
, but I have a hard time correctly formatting this DataFrame. 我也尝试使用一些
.groupby()
和.transpose()
,但我很难正确格式化这个DataFrame。 I have already read the following documentation pivot_table , reshaping dataframe and this cannonical by PiRSquared . 我已经阅读了以下文档pivot_table , 重新整形数据帧 ,这是由PiRSquared提供的 。
Use ordered Categorical
for correct sort of months with DataFrame.swaplevel
and DataFrame.sort_index
: 使用有序
Categorical
,使用DataFrame.swaplevel
和DataFrame.sort_index
正确排序月份:
months = ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aAug', 'sep', 'oct', 'nov', 'dec']
df['month'] = pd.Categorical(df['month'], categories=months, ordered=True)
df = pd.pivot_table(
df,
index=['index_1', 'index_2'],
columns=['month'],
).swaplevel(0,1, axis=1).sort_index(axis=1)
print(df)
month jan feb
value_1 value_2 value_1 value_2
index_1 index_2
A A 0.0 8.0 NaN NaN
B 1.0 9.0 NaN NaN
C NaN NaN 2.0 10.0
D NaN NaN 3.0 11.0
B A 4.0 12.0 NaN NaN
B 5.0 13.0 NaN NaN
C NaN NaN 6.0 14.0
D NaN NaN 7.0 15.0
Or: 要么:
months = ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aAug', 'sep', 'oct', 'nov', 'dec']
df['month'] = pd.Categorical(df['month'], categories=months, ordered=True)
df = (df.set_index(['index_1','index_2','month'])
.unstack()
.swaplevel(0,1, axis=1)
.sort_index(axis=1))
print (df)
month jan feb
value_1 value_2 value_1 value_2
index_1 index_2
A A 0.0 8.0 NaN NaN
B 1.0 9.0 NaN NaN
C NaN NaN 2.0 10.0
D NaN NaN 3.0 11.0
B A 4.0 12.0 NaN NaN
B 5.0 13.0 NaN NaN
C NaN NaN 6.0 14.0
D NaN NaN 7.0 15.0
how about stack
and unstack
: 如何
stack
和unstack
stack
:
df.set_index(['index_1','index_2','month']).stack().unstack([2,3])
month jan feb
value_1 value_2 value_1 value_2
index_1 index_2
A A 0.0 8.0 NaN NaN
B 1.0 9.0 NaN NaN
C NaN NaN 2.0 10.0
D NaN NaN 3.0 11.0
B A 4.0 12.0 NaN NaN
B 5.0 13.0 NaN NaN
C NaN NaN 6.0 14.0
D NaN NaN 7.0 15.0
import pandas as pd
df = pd.DataFrame({
'index_1' : ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
'index_2' : ['A', 'B', 'C', 'D', 'A', 'B', 'C', 'D'],
'month' : ['jan', 'jan', 'feb', 'feb', 'jan', 'jan', 'feb', 'feb'],
'value_1' : range(0, 8),
'value_2' : range(8, 16)})
print(df)
df = df.set_index(['index_1','index_2','month']).unstack(level=-1)
print(df)
Instead of a pivot table this is something called a hierarchical index . 这不是一个数据透视表,而是一种称为分层索引的东西。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.