簡體   English   中英

Pandas 將多列堆疊成一列

[英]Pandas stack multiple columns to a single column

我有以下數據幀:

                    ETHNIC                       RACE        AGE       TRT01A
0   NOT HISPANIC OR LATINO                      WHITE  31.824778  Treatment B
1   NOT HISPANIC OR LATINO                      WHITE  31.381246      Placebo
2       HISPANIC OR LATINO                      WHITE  45.522245  Treatment A
3       HISPANIC OR LATINO  BLACK OR AFRICAN AMERICAN  42.910335  Treatment B
4   NOT HISPANIC OR LATINO                      WHITE  31.381246      Placebo
5   NOT HISPANIC OR LATINO                      WHITE  38.045175  Treatment B
6       HISPANIC OR LATINO                      WHITE  39.337440      Placebo
7   NOT HISPANIC OR LATINO                      WHITE  47.121150      Placebo
8   NOT HISPANIC OR LATINO                      WHITE  38.203970  Treatment A
9   NOT HISPANIC OR LATINO  BLACK OR AFRICAN AMERICAN  22.926762      Placebo
10      HISPANIC OR LATINO                      WHITE  45.226557  Treatment B
11      HISPANIC OR LATINO                      WHITE  32.112252      Placebo

只需將上面的數據幀復制到剪貼板並運行df=pd.read_clipboard('\\s\\s+')將數據幀放入一個變量中。

out = (df.groupby(['TRT01A','ETHNIC', 'RACE'])['AGE']
       .agg(mean=np.mean, 
            n='count', 
            deviation=np.std,
            Q1=lambda x: np.percentile(x, 0.25)
            )
       .T.unstack().unstack(0)
       )

我在上面的數據幀中執行了一些聚合,並轉置,並連續拆開它們以獲得以下結果:

TRT01A                                                        Placebo  Treatment A  Treatment B
ETHNIC                 RACE                                                                    
HISPANIC OR LATINO     BLACK OR AFRICAN AMERICAN mean             NaN          NaN    42.910335
                                                 n                NaN          NaN     1.000000
                                                 deviation        NaN          NaN          NaN
                                                 Q1               NaN          NaN    42.910335
                       WHITE                     mean       35.724846    45.522245    45.226557
                                                 n           2.000000     1.000000     1.000000
                                                 deviation   5.108979          NaN          NaN
                                                 Q1         32.130315    45.522245    45.226557
NOT HISPANIC OR LATINO BLACK OR AFRICAN AMERICAN mean       22.926762          NaN          NaN
                                                 n           1.000000          NaN          NaN
                                                 deviation        NaN          NaN          NaN
                                                 Q1         22.926762          NaN          NaN
                       WHITE                     mean       36.627881    38.203970    34.934976
                                                 n           3.000000     1.000000     2.000000
                                                 deviation   9.087438          NaN     4.398485
                                                 Q1         31.381246    38.203970    31.840329

現在,我想解開所有索引以獲得以下結構(即為所有索引列從第一個到第二個最后插入NaN行,以及表示索引Level列):

                             Placebo  Treatment A  Treatment B  Level
HISPANIC OR LATINO               NaN          NaN          NaN      0 <---
BLACK OR AFRICAN AMERICAN        NaN          NaN          NaN      1 <---
mean                             NaN          NaN    42.910335      2
n                                NaN          NaN     1.000000      2
deviation                        NaN          NaN          NaN      2
Q1                               NaN          NaN    42.910335      2
WHITE                            NaN          NaN          NaN      1 <---
mean                       35.724846    45.522245    45.226557      2
n                           2.000000     1.000000     1.000000      2
deviation                   5.108979          NaN          NaN      2
Q1                         32.130315    45.522245    45.226557      2
NOT HISPANIC OR LATINO           NaN          NaN          NaN      0 <---
BLACK OR AFRICAN AMERICAN        NaN          NaN          NaN      1 <---
mean                       22.926762          NaN          NaN      2
n                           1.000000          NaN          NaN      2
deviation                        NaN          NaN          NaN      2
Q1                         22.926762          NaN          NaN      2
WHITE                            NaN          NaN          NaN      1 <---
mean                       36.627881    38.203970    34.934976      2
n                           3.000000     1.000000     2.000000      2
deviation                   9.087438          NaN     4.398485      2
Q1                         31.381246    38.203970    31.840329      2   

這個問題與我問上一個問題相同,但問題是,聚合后可以有 1 到 4 個索引列,(即聚合可能應用於 1 到 5 列),並且很難使用在這種情況下與以前的解決方案相同。

首先將自定義函數與DataFrame.append一起使用,自定義DataFrame由默認NaN值填充:

def f(x):
    names = pd.DataFrame(index=x.name, columns=x.columns).assign(Level=[0,1])
    #print (names)
    return names.append(x.reset_index(level=[0,1], drop=True).assign(Level=2))

out = out.groupby(level=[0,1], group_keys=False).apply(f)

然后刪除重復的0級:

out = out[~out.index.duplicated() | out['Level'].isin([1,2])]

print (out)
TRT01A                       Placebo  Treatment A  Treatment B  Level
HISPANIC OR LATINO               NaN          NaN          NaN      0
BLACK OR AFRICAN AMERICAN        NaN          NaN          NaN      1
mean                             NaN          NaN    42.910335      2
n                                NaN          NaN     1.000000      2
deviation                        NaN          NaN          NaN      2
Q1                               NaN          NaN    42.910335      2
WHITE                            NaN          NaN          NaN      1
mean                       35.724846    45.522245    45.226557      2
n                           2.000000     1.000000     1.000000      2
deviation                   5.108979          NaN          NaN      2
Q1                         32.130315    45.522245    45.226557      2
NOT HISPANIC OR LATINO           NaN          NaN          NaN      0
BLACK OR AFRICAN AMERICAN        NaN          NaN          NaN      1
mean                       22.926762          NaN          NaN      2
n                           1.000000          NaN          NaN      2
deviation                        NaN          NaN          NaN      2
Q1                         22.926762          NaN          NaN      2
WHITE                            NaN          NaN          NaN      1
mean                       36.627881    38.203970    34.934976      2
n                           3.000000     1.000000     2.000000      2
deviation                   9.087438          NaN     4.398485      2
Q1                         31.381246    38.203970    31.840329      2
    

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM