简体   繁体   English

转置多列 Pandas 数据框

[英]transpose multiple columns Pandas dataframe

I'm trying to reshape a dataframe, but I'm not able to get the results I need.我正在尝试重塑数据框,但无法获得所需的结果。 The dataframe looks like this:数据框如下所示:

    m   r   s   p   O       W       N         
    1   4   3   1   2.81    3.70    3.03  
    1   4   4   1   2.14    2.82    2.31  
    1   4   5   1   1.47    1.94    1.59  
    1   4   3   2   0.58    0.78    0.60  
    1   4   4   2   0.67    0.00    0.00
    1   4   5   2   1.03    2.45    1.68
    1   4   3   3   1.98    1.34    1.81
    1   4   4   3   0.00    0.04    0.15
    1   4   5   3   0.01    0.00    0.26

I need to reshape the dataframe so it will look like this:我需要重塑数据框,使其看起来像这样:

    m   r   s   p   O       W       N      p    O       W       N     p  O      W       N
    1   4   3   1   2.81    3.70    3.03   2    0.58    0.78    0.60  3  1.98   1.34    1.81
    1   4   4   1   2.14    2.82    2.31   2    0.67    0.00    0.00  3 0.00    0.04    0.15
    1   4   5   1   1.47    1.94    1.59   2    1.03    2.45    1.68  3 0.01    0.00    0.26

I tried to use the pivot_table function我尝试使用pivot_table函数

df.pivot_table(index=['m','r','s'], columns=['p'], values=['O','W','N']) 

but I'm not able to get quite what I want.但我无法得到我想要的。 Does anyone know how to do this?有谁知道如何做到这一点?

As someone who fancies himself as pretty handy with pandas, the pivot_table and melt functions are confusing to me.作为一个自认为对pivot_table非常方便的人, pivot_tablemelt函数让我感到困惑。 I prefer to stick with a well-defined and unique index and use the stack and unstack methods of the dataframe itself.我更喜欢坚持使用定义明确且唯一的索引,并使用数据帧本身的stackunstack方法。

First, I'll ask if you really need to repeat the p-column like that?首先,我会问你是否真的需要像那样重复 p 列? I can sort of see its value when presenting data, but IMO pandas isn't really set up to work like that.在呈现数据时我可以看到它的价值,但 IMO pandas 并没有真正设置为那样工作。 We could shoehorn it in, but let's see if a simpler solution gets you what you need.我们可以硬塞进去,但让我们看看更简单的解决方案是否能满足您的需求。

Here's what I would do:这是我会做的:

from io import StringIO
import pandas

datatable = StringIO("""\
    m   r   s   p   O       W       N         
    1   4   3   1   2.81    3.70    3.03  
    1   4   4   1   2.14    2.82    2.31  
    1   4   5   1   1.47    1.94    1.59  
    1   4   3   2   0.58    0.78    0.60  
    1   4   4   2   0.67    0.00    0.00
    1   4   5   2   1.03    2.45    1.68
    1   4   3   3   1.98    1.34    1.81
    1   4   4   3   0.00    0.04    0.15
    1   4   5   3   0.01    0.00    0.26""")

df = (
    pandas.read_table(datatable, sep='\s+')
          .set_index(['m', 'r', 's', 'p'])
          .unstack(level='p')
)

df.columns = df.columns.swaplevel(0, 1)
df.sort(axis=1, inplace=True)

print(df)

Which prints:哪个打印:

p         1                 2                 3            
          O     W     N     O     W     N     O     W     N
m r s                                                      
1 4 3  2.81  3.70  3.03  0.58  0.78  0.60  1.98  1.34  1.81
    4  2.14  2.82  2.31  0.67  0.00  0.00  0.00  0.04  0.15
    5  1.47  1.94  1.59  1.03  2.45  1.68  0.01  0.00  0.26

So now the columns are a MultiIndex and you can access, for example, all of the values where p = 2 with df[2] or df.xs(2, level='p', axis=1) , which gives me:所以现在这些列是一个 MultiIndex 并且你可以访问,例如,所有的值,其中p = 2df[2]df.xs(2, level='p', axis=1) ,这给了我:

          O     W     N
m r s                  
1 4 3  0.58  0.78  0.60
    4  0.67  0.00  0.00
    5  1.03  2.45  1.68

Similarly, you can get all of the W columns with: df.xs('W', level=1, axis=1) (we say level=1 ) because that column level does not have a name, so we use its position instead)类似地,您可以使用以下命令获取所有W列: df.xs('W', level=1, axis=1) (我们说level=1 )因为该列级别没有名称,所以我们使用它的位置反而)

p         1     2     3
m r s                  
1 4 3  3.70  0.78  1.34
    4  2.82  0.00  0.04
    5  1.94  2.45  0.00

You can similarly query the columns by using axis=0 .您可以使用axis=0类似地查询列。

If you really need the p values in a column, just add it there manually and reindex your columns:如果您真的需要列中的p值,只需手动添加它并重新索引您的列:

for p in df.columns.get_level_values('p').unique():
    df[p, 'p'] = p

cols = pandas.MultiIndex.from_product([[1,2,3], list('pOWN')])
df = df.reindex(columns=cols)
print(df)

       1                    2                    3                  
       p     O     W     N  p     O     W     N  p     O     W     N
m r s                                                               
1 4 3  1  2.81  3.70  3.03  2  0.58  0.78  0.60  3  1.98  1.34  1.81
    4  1  2.14  2.82  2.31  2  0.67  0.00  0.00  3  0.00  0.04  0.15
    5  1  1.47  1.94  1.59  2  1.03  2.45  1.68  3  0.01  0.00  0.26
    b = open('ss2.csv', 'w')
    a = csv.writer(b)
    sk = ''
    with open ('df_col2.csv', 'r') as ann:
        for col in ann:
            an = col.lower().strip('\n').split(',')
            suk += an[0] + ','
    sk = sk[:-2]
    a.writerow([sk])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM