[英]Pandas stack multiple columns to a single column
我有以下數據幀:
ETHNIC RACE AGE TRT01A
0 NOT HISPANIC OR LATINO WHITE 31.824778 Treatment B
1 NOT HISPANIC OR LATINO WHITE 31.381246 Placebo
2 HISPANIC OR LATINO WHITE 45.522245 Treatment A
3 HISPANIC OR LATINO BLACK OR AFRICAN AMERICAN 42.910335 Treatment B
4 NOT HISPANIC OR LATINO WHITE 31.381246 Placebo
5 NOT HISPANIC OR LATINO WHITE 38.045175 Treatment B
6 HISPANIC OR LATINO WHITE 39.337440 Placebo
7 NOT HISPANIC OR LATINO WHITE 47.121150 Placebo
8 NOT HISPANIC OR LATINO WHITE 38.203970 Treatment A
9 NOT HISPANIC OR LATINO BLACK OR AFRICAN AMERICAN 22.926762 Placebo
10 HISPANIC OR LATINO WHITE 45.226557 Treatment B
11 HISPANIC OR LATINO WHITE 32.112252 Placebo
只需將上面的數據幀復制到剪貼板並運行df=pd.read_clipboard('\\s\\s+')
將數據幀放入一個變量中。
out = (df.groupby(['TRT01A','ETHNIC', 'RACE'])['AGE']
.agg(mean=np.mean,
n='count',
deviation=np.std,
Q1=lambda x: np.percentile(x, 0.25)
)
.T.unstack().unstack(0)
)
我在上面的數據幀中執行了一些聚合,並轉置,並連續拆開它們以獲得以下結果:
TRT01A Placebo Treatment A Treatment B
ETHNIC RACE
HISPANIC OR LATINO BLACK OR AFRICAN AMERICAN mean NaN NaN 42.910335
n NaN NaN 1.000000
deviation NaN NaN NaN
Q1 NaN NaN 42.910335
WHITE mean 35.724846 45.522245 45.226557
n 2.000000 1.000000 1.000000
deviation 5.108979 NaN NaN
Q1 32.130315 45.522245 45.226557
NOT HISPANIC OR LATINO BLACK OR AFRICAN AMERICAN mean 22.926762 NaN NaN
n 1.000000 NaN NaN
deviation NaN NaN NaN
Q1 22.926762 NaN NaN
WHITE mean 36.627881 38.203970 34.934976
n 3.000000 1.000000 2.000000
deviation 9.087438 NaN 4.398485
Q1 31.381246 38.203970 31.840329
現在,我想解開所有索引以獲得以下結構(即為所有索引列從第一個到第二個最后插入NaN
行,以及表示索引Level
列):
Placebo Treatment A Treatment B Level
HISPANIC OR LATINO NaN NaN NaN 0 <---
BLACK OR AFRICAN AMERICAN NaN NaN NaN 1 <---
mean NaN NaN 42.910335 2
n NaN NaN 1.000000 2
deviation NaN NaN NaN 2
Q1 NaN NaN 42.910335 2
WHITE NaN NaN NaN 1 <---
mean 35.724846 45.522245 45.226557 2
n 2.000000 1.000000 1.000000 2
deviation 5.108979 NaN NaN 2
Q1 32.130315 45.522245 45.226557 2
NOT HISPANIC OR LATINO NaN NaN NaN 0 <---
BLACK OR AFRICAN AMERICAN NaN NaN NaN 1 <---
mean 22.926762 NaN NaN 2
n 1.000000 NaN NaN 2
deviation NaN NaN NaN 2
Q1 22.926762 NaN NaN 2
WHITE NaN NaN NaN 1 <---
mean 36.627881 38.203970 34.934976 2
n 3.000000 1.000000 2.000000 2
deviation 9.087438 NaN 4.398485 2
Q1 31.381246 38.203970 31.840329 2
這個問題與我問的上一個問題相同,但問題是,聚合后可以有 1 到 4 個索引列,(即聚合可能應用於 1 到 5 列),並且很難使用在這種情況下與以前的解決方案相同。
首先將自定義函數與DataFrame.append
一起使用,自定義DataFrame
由默認NaN
值填充:
def f(x):
names = pd.DataFrame(index=x.name, columns=x.columns).assign(Level=[0,1])
#print (names)
return names.append(x.reset_index(level=[0,1], drop=True).assign(Level=2))
out = out.groupby(level=[0,1], group_keys=False).apply(f)
然后刪除重復的0
級:
out = out[~out.index.duplicated() | out['Level'].isin([1,2])]
print (out)
TRT01A Placebo Treatment A Treatment B Level
HISPANIC OR LATINO NaN NaN NaN 0
BLACK OR AFRICAN AMERICAN NaN NaN NaN 1
mean NaN NaN 42.910335 2
n NaN NaN 1.000000 2
deviation NaN NaN NaN 2
Q1 NaN NaN 42.910335 2
WHITE NaN NaN NaN 1
mean 35.724846 45.522245 45.226557 2
n 2.000000 1.000000 1.000000 2
deviation 5.108979 NaN NaN 2
Q1 32.130315 45.522245 45.226557 2
NOT HISPANIC OR LATINO NaN NaN NaN 0
BLACK OR AFRICAN AMERICAN NaN NaN NaN 1
mean 22.926762 NaN NaN 2
n 1.000000 NaN NaN 2
deviation NaN NaN NaN 2
Q1 22.926762 NaN NaN 2
WHITE NaN NaN NaN 1
mean 36.627881 38.203970 34.934976 2
n 3.000000 1.000000 2.000000 2
deviation 9.087438 NaN 4.398485 2
Q1 31.381246 38.203970 31.840329 2
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.