[英]reshaping data frame in pandas
Let's say I have this data frame: 假设我有这个数据框:
df = pd.DataFrame({'n':[0 ,1 ,0 ,0 ,1 ,1 ,0 ,1],'l':[12 ,16 ,92, 77 ,32 ,47, 22, 14], 'cols':['col1','col1','col1','col1','col2','col2','col2','col2']})
and this is what I'm trying to get: 这就是我想要得到的:
col1 col2
l n l n
12 0 32 1
16 1 47 1
92 0 22 0
77 0 14 1
I've been playing around with set_index
and stack
/ unstack
methods but with no success... 我一直在玩弄
set_index
和stack
/ unstack
方法,但没有成功...
import pandas as pd
df = pd.DataFrame(
{'n':[0 ,1 ,0 ,0 ,1 ,1 ,0 ,1],'l':[12 ,16 ,92, 77 ,32 ,47, 22, 14],
'cols':['col1','col1','col1','col1','col2','col2','col2','col2']})
df['index'] = df.groupby(['cols']).cumcount()
result = df.pivot(index='index', columns='cols')
print(result)
# l n
# cols col1 col2 col1 col2
# index
# 0 12 32 0 1
# 1 16 47 1 1
# 2 92 22 0 0
# 3 77 14 0 1
If you care about the order of the labels in the MultiIndex column, you could use stack and unstack to exactly reproduce result you posted: 如果您关心“ MultiIndex”列中标签的顺序,则可以使用“堆栈”和“堆栈”来精确复制发布的结果:
result = result.stack(level=0).unstack(level=1)
print(result)
# cols col1 col2
# l n l n
# index
# 0 12 0 32 1
# 1 16 1 47 1
# 2 92 0 22 0
# 3 77 0 14 1
When looking for a solution it is often useful to think backwards. 在寻找解决方案时,回头思考通常会很有用。
Start with the desired DataFrame and ask yourself what operation might result in the desired DataFrame. 从所需的DataFrame开始,然后问自己,什么操作可能会导致所需的DataFrame。 In this case, the operation that came to mind was
pd.pivot
. 在这种情况下,想到的操作是
pd.pivot
。 Then the question becomes, what DataFrame, something
, is needed so that 然后问题就变成
something
,需要什么DataFrame,以便
desired = something.pivot(index='index', columns='cols')
By looking at other examples of pivot
in action, it became clear than something
had to equal 通过查看行动
pivot
其他示例 ,可以清楚地看到, something
并非必须平等。
cols l n index
0 col1 12 0 0
1 col1 16 1 1
2 col1 92 0 2
3 col1 77 0 3
4 col2 32 1 0
5 col2 47 1 1
6 col2 22 0 2
7 col2 14 1 3
Then you see if you can find a way to massage df
into something
, or again working backwards, massage something
into df
... From this point of view, in this case, the missing link became apparent: something
has an index
column that df
lacked. 然后,您会发现是否可以找到一种方法来将
df
按摩成something
, 或者再次向后工作,将something
按摩成df
...从这种角度来看,在这种情况下,缺少的链接变得很明显: something
的index
df
缺乏。
You can use a combination of DataFrame.groupby
, DataFrame.reset_index
and DataFrame.T
(transpose) 您可以结合使用
DataFrame.groupby
, DataFrame.reset_index
和DataFrame.T
(转置)
import pandas as pd
df = pd.DataFrame({'n':[0 ,1 ,0 ,0 ,1 ,1 ,0, 1],'l':[12 ,16 ,92, 77 ,32 ,47, 22, 14], 'cols':['col1','col1','col1','col1','col2','col2','col2','col2']})
print df.groupby('cols').apply(lambda x: x.reset_index(drop=True).drop('cols',axis=1).T).T
Output: 输出:
cols col1 col2
l n l n
0 12 0 32 1
1 16 1 47 1
2 92 0 22 0
3 77 0 14 1
Or you can use concat
: 或者您可以使用
concat
:
print pd.concat([g.drop('cols',axis=1).reset_index(drop=True) for _,g in df.groupby('cols')],axis=1,keys=df['cols'].unique())
Output: 输出:
col1 col2
l n l n
0 12 0 32 1
1 16 1 47 1
2 92 0 22 0
3 77 0 14 1
Hope it helps, :) 希望能帮助到你, :)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.