Pandas：转置列 (b,c,d) 作为索引分组

Question

我有一个 dataframe：

import pandas as pd 
import numpy as np 
data = pd.DataFrame({'file':['file1','file1','file1','file2','file2','file2' ], 'x': [1,2,3,1,2,4], 'y': [10,20,30,10, 20, 40], 'norm_y': [2,4,6,2,4,8]})

print (data)

out: 
    file  x   y  norm_y
0  file1  1  10       2
1  file1  2  20       4
2  file1  3  30       6
3  file2  1  10       2
4  file2  2  20       4
5  file2  4  40       8

我想打印它，以便：

文件是主要索引
x,y,z 是子索引

使它看起来像这样：

    file          
0         x     1  2  3
1  file1  y     10 20 30
2         ynorm 2  4  6
3         x     1  2  4
4  file2  y     10 20 40
5         ynorm 2  4  8

我认为答案将是这样的：

设置行索引：data.set_index(['file'])
转置 x,y,ynorm 列

Answer 1

这是一个pivot问题的核心，但并不是一个简单的问题。

df.assign(
  key=df.groupby('file').cumcount()).set_index(['file', 'key']).stack().unstack('key')

key            0   1   2
file
file1 x        1   2   3
      y       10  20  30
      norm_y   2   4   6
file2 x        1   2   4
      y       10  20  40
      norm_y   2   4   8

Answer 2

你只需要一点想象力：

data.set_index('file').groupby(level=0).apply(lambda x: x.T)

Output：

file          file2  file2  file2
file                             
file1 x           1      2      3
      y          10     20     30
      norm_y      2      4      6
file2 x           1      2      4
      y          10     20     40
      norm_y      2      4      8

Answer 3

你可以做：

df['col' ] = df.groupby('file').cumcount()+1
df.pivot_table(index='file', columns='col').stack(level=0)

Output：

col            1   2   3
file                    
file1 norm_y   2   4   6
      x        1   2   3
      y       10  20  30
file2 norm_y   2   4   8
      x        1   2   4
      y       10  20  40

Answer 4

玩numpy重塑

fil, var, val = df.melt('file').values.T

new = pd.DataFrame(np.hstack([fil.reshape(-1,3)[:, 0].reshape(-1,1), 
                              var.reshape(-1,3)[:, 0].reshape(-1,1), 
                              val.reshape(-1,3)]))\
        .set_index([0,1])\
        .sort_index()

               2   3   4
0     1                 
file1 norm_y   2   4   6
      x        1   2   3
      y       10  20  30
file2 norm_y   2   4   8
      x        1   2   4
      y       10  20  40

Answer 5

尝试这个。

(
    pd.melt(data, id_vars='file', value_vars=['x', 'y', 'norm_y']) #Unstacks the data
    .groupby(['file', 'variable'])['value'] #restacks with file and variable as index
    .aggregate(lambda x: tuple(x)) #splits out values in to a column
    .apply(pd.Series) #turns them into separate columns
)

Pandas：转置列 (b,c,d) 作为索引分组

问题描述

5 个解决方案

解决方案1
3 已采纳 2019-10-07 17:15:29

解决方案2
3 2019-10-07 17:29:40

解决方案3
2 2019-10-07 17:17:16

解决方案4
1 2019-10-07 17:23:17

解决方案5
0 2019-10-07 17:30:01

Pandas：转置列 (b,c,d) 作为索引分组

问题描述

5 个解决方案

解决方案1 3 已采纳 2019-10-07 17:15:29

解决方案2 3 2019-10-07 17:29:40

解决方案3 2 2019-10-07 17:17:16

解决方案4 1 2019-10-07 17:23:17

解决方案5 0 2019-10-07 17:30:01

解决方案1
3 已采纳 2019-10-07 17:15:29

解决方案2
3 2019-10-07 17:29:40

解决方案3
2 2019-10-07 17:17:16

解决方案4
1 2019-10-07 17:23:17

解决方案5
0 2019-10-07 17:30:01