简体   繁体   English

重塑熊猫中的数据框

[英]reshaping data frame in pandas

Let's say I have this data frame: 假设我有这个数据框:

df = pd.DataFrame({'n':[0 ,1 ,0 ,0 ,1 ,1 ,0 ,1],'l':[12 ,16 ,92, 77 ,32 ,47, 22, 14], 'cols':['col1','col1','col1','col1','col2','col2','col2','col2']})

and this is what I'm trying to get: 这就是我想要得到的:

col1    col2
l   n   l   n
12  0   32  1
16  1   47  1
92  0   22  0
77  0   14  1

I've been playing around with set_index and stack / unstack methods but with no success... 我一直在玩弄set_indexstack / unstack方法,但没有成功...

import pandas as pd

df = pd.DataFrame(
    {'n':[0 ,1 ,0 ,0 ,1 ,1 ,0 ,1],'l':[12 ,16 ,92, 77 ,32 ,47, 22, 14],
     'cols':['col1','col1','col1','col1','col2','col2','col2','col2']})

df['index'] = df.groupby(['cols']).cumcount()
result = df.pivot(index='index', columns='cols')
print(result)
#           l           n      
# cols   col1  col2  col1  col2
# index                        
# 0        12    32     0     1
# 1        16    47     1     1
# 2        92    22     0     0
# 3        77    14     0     1

If you care about the order of the labels in the MultiIndex column, you could use stack and unstack to exactly reproduce result you posted: 如果您关心“ MultiIndex”列中标签的顺序,则可以使用“堆栈”和“堆栈”来精确复制发布的结果:

result = result.stack(level=0).unstack(level=1)
print(result)

# cols   col1     col2   
#           l  n     l  n
# index                  
# 0        12  0    32  1
# 1        16  1    47  1
# 2        92  0    22  0
# 3        77  0    14  1

When looking for a solution it is often useful to think backwards. 在寻找解决方案时,回头思考通常会很有用。

Start with the desired DataFrame and ask yourself what operation might result in the desired DataFrame. 从所需的DataFrame开始,然后问自己,什么操作可能会导致所需的DataFrame。 In this case, the operation that came to mind was pd.pivot . 在这种情况下,想到的操作是pd.pivot Then the question becomes, what DataFrame, something , is needed so that 然后问题就变成something ,需要什么DataFrame,以便

desired = something.pivot(index='index', columns='cols') 

By looking at other examples of pivot in action, it became clear than something had to equal 通过查看行动pivot 其他示例 ,可以清楚地看到, something并非必须平等。

   cols   l  n  index
0  col1  12  0      0
1  col1  16  1      1
2  col1  92  0      2
3  col1  77  0      3
4  col2  32  1      0
5  col2  47  1      1
6  col2  22  0      2
7  col2  14  1      3

Then you see if you can find a way to massage df into something , or again working backwards, massage something into df ... From this point of view, in this case, the missing link became apparent: something has an index column that df lacked. 然后,您会发现是否可以找到一种方法来将df按摩成something或者再次向后工作,将something按摩成df ...从这种角度来看,在这种情况下,缺少的链接变得很明显: somethingindex df缺乏。

You can use a combination of DataFrame.groupby , DataFrame.reset_index and DataFrame.T (transpose) 您可以结合使用DataFrame.groupbyDataFrame.reset_indexDataFrame.T (转置)

import pandas as pd

df = pd.DataFrame({'n':[0 ,1 ,0 ,0 ,1 ,1 ,0, 1],'l':[12 ,16 ,92, 77 ,32 ,47, 22, 14], 'cols':['col1','col1','col1','col1','col2','col2','col2','col2']})
print df.groupby('cols').apply(lambda x: x.reset_index(drop=True).drop('cols',axis=1).T).T

Output: 输出:

cols  col1     col2   
         l  n     l  n
0       12  0    32  1
1       16  1    47  1
2       92  0    22  0
3       77  0    14  1

Or you can use concat : 或者您可以使用concat

print pd.concat([g.drop('cols',axis=1).reset_index(drop=True) for _,g in df.groupby('cols')],axis=1,keys=df['cols'].unique())

Output: 输出:

   col1     col2   
      l  n     l  n
0    12  0    32  1
1    16  1    47  1
2    92  0    22  0
3    77  0    14  1

Hope it helps, :) 希望能帮助到你, :)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM