转置多列 Pandas 数据框

Question

I'm trying to reshape a dataframe, but I'm not able to get the results I need.我正在尝试重塑数据框，但无法获得所需的结果。 The dataframe looks like this:数据框如下所示：

    m   r   s   p   O       W       N         
    1   4   3   1   2.81    3.70    3.03  
    1   4   4   1   2.14    2.82    2.31  
    1   4   5   1   1.47    1.94    1.59  
    1   4   3   2   0.58    0.78    0.60  
    1   4   4   2   0.67    0.00    0.00
    1   4   5   2   1.03    2.45    1.68
    1   4   3   3   1.98    1.34    1.81
    1   4   4   3   0.00    0.04    0.15
    1   4   5   3   0.01    0.00    0.26

I need to reshape the dataframe so it will look like this:我需要重塑数据框，使其看起来像这样：

    m   r   s   p   O       W       N      p    O       W       N     p  O      W       N
    1   4   3   1   2.81    3.70    3.03   2    0.58    0.78    0.60  3  1.98   1.34    1.81
    1   4   4   1   2.14    2.82    2.31   2    0.67    0.00    0.00  3 0.00    0.04    0.15
    1   4   5   1   1.47    1.94    1.59   2    1.03    2.45    1.68  3 0.01    0.00    0.26

I tried to use the pivot_table function我尝试使用pivot_table函数

df.pivot_table(index=['m','r','s'], columns=['p'], values=['O','W','N'])

but I'm not able to get quite what I want.但我无法得到我想要的。 Does anyone know how to do this?有谁知道如何做到这一点？

Answer 1

As someone who fancies himself as pretty handy with pandas, the pivot_table and melt functions are confusing to me.作为一个自认为对pivot_table非常方便的人， pivot_table和melt函数让我感到困惑。 I prefer to stick with a well-defined and unique index and use the stack and unstack methods of the dataframe itself.我更喜欢坚持使用定义明确且唯一的索引，并使用数据帧本身的stack和unstack方法。

First, I'll ask if you really need to repeat the p-column like that?首先，我会问你是否真的需要像那样重复 p 列？ I can sort of see its value when presenting data, but IMO pandas isn't really set up to work like that.在呈现数据时我可以看到它的价值，但 IMO pandas 并没有真正设置为那样工作。 We could shoehorn it in, but let's see if a simpler solution gets you what you need.我们可以硬塞进去，但让我们看看更简单的解决方案是否能满足您的需求。

Here's what I would do:这是我会做的：

from io import StringIO
import pandas

datatable = StringIO("""\
    m   r   s   p   O       W       N         
    1   4   3   1   2.81    3.70    3.03  
    1   4   4   1   2.14    2.82    2.31  
    1   4   5   1   1.47    1.94    1.59  
    1   4   3   2   0.58    0.78    0.60  
    1   4   4   2   0.67    0.00    0.00
    1   4   5   2   1.03    2.45    1.68
    1   4   3   3   1.98    1.34    1.81
    1   4   4   3   0.00    0.04    0.15
    1   4   5   3   0.01    0.00    0.26""")

df = (
    pandas.read_table(datatable, sep='\s+')
          .set_index(['m', 'r', 's', 'p'])
          .unstack(level='p')
)

df.columns = df.columns.swaplevel(0, 1)
df.sort(axis=1, inplace=True)

print(df)

Which prints:哪个打印：

p         1                 2                 3            
          O     W     N     O     W     N     O     W     N
m r s                                                      
1 4 3  2.81  3.70  3.03  0.58  0.78  0.60  1.98  1.34  1.81
    4  2.14  2.82  2.31  0.67  0.00  0.00  0.00  0.04  0.15
    5  1.47  1.94  1.59  1.03  2.45  1.68  0.01  0.00  0.26

So now the columns are a MultiIndex and you can access, for example, all of the values where p = 2 with df[2] or df.xs(2, level='p', axis=1) , which gives me:所以现在这些列是一个 MultiIndex 并且你可以访问，例如，所有的值，其中p = 2与df[2]或df.xs(2, level='p', axis=1) ，这给了我：

          O     W     N
m r s                  
1 4 3  0.58  0.78  0.60
    4  0.67  0.00  0.00
    5  1.03  2.45  1.68

Similarly, you can get all of the W columns with: df.xs('W', level=1, axis=1) (we say level=1 ) because that column level does not have a name, so we use its position instead)类似地，您可以使用以下命令获取所有W列： df.xs('W', level=1, axis=1) （我们说level=1 ）因为该列级别没有名称，所以我们使用它的位置反而）

p         1     2     3
m r s                  
1 4 3  3.70  0.78  1.34
    4  2.82  0.00  0.04
    5  1.94  2.45  0.00

You can similarly query the columns by using axis=0 .您可以使用axis=0类似地查询列。

If you really need the p values in a column, just add it there manually and reindex your columns:如果您真的需要列中的p值，只需手动添加它并重新索引您的列：

for p in df.columns.get_level_values('p').unique():
    df[p, 'p'] = p

cols = pandas.MultiIndex.from_product([[1,2,3], list('pOWN')])
df = df.reindex(columns=cols)
print(df)

       1                    2                    3                  
       p     O     W     N  p     O     W     N  p     O     W     N
m r s                                                               
1 4 3  1  2.81  3.70  3.03  2  0.58  0.78  0.60  3  1.98  1.34  1.81
    4  1  2.14  2.82  2.31  2  0.67  0.00  0.00  3  0.00  0.04  0.15
    5  1  1.47  1.94  1.59  2  1.03  2.45  1.68  3  0.01  0.00  0.26

Answer 2

    b = open('ss2.csv', 'w')
    a = csv.writer(b)
    sk = ''
    with open ('df_col2.csv', 'r') as ann:
        for col in ann:
            an = col.lower().strip('\n').split(',')
            suk += an[0] + ','
    sk = sk[:-2]
    a.writerow([sk])

转置多列 Pandas 数据框

问题描述

2 个解决方案

解决方案1
6 已采纳 2014-09-15 16:40:16

解决方案2
0 2021-03-25 14:05:48

转置多列 Pandas 数据框

问题描述

2 个解决方案

解决方案1 6 已采纳 2014-09-15 16:40:16

解决方案2 0 2021-03-25 14:05:48

解决方案1
6 已采纳 2014-09-15 16:40:16

解决方案2
0 2021-03-25 14:05:48