[英]Pandas stack multiple columns to a single column
I have following DataFrame:我有以下数据帧:
ETHNIC RACE AGE TRT01A
0 NOT HISPANIC OR LATINO WHITE 31.824778 Treatment B
1 NOT HISPANIC OR LATINO WHITE 31.381246 Placebo
2 HISPANIC OR LATINO WHITE 45.522245 Treatment A
3 HISPANIC OR LATINO BLACK OR AFRICAN AMERICAN 42.910335 Treatment B
4 NOT HISPANIC OR LATINO WHITE 31.381246 Placebo
5 NOT HISPANIC OR LATINO WHITE 38.045175 Treatment B
6 HISPANIC OR LATINO WHITE 39.337440 Placebo
7 NOT HISPANIC OR LATINO WHITE 47.121150 Placebo
8 NOT HISPANIC OR LATINO WHITE 38.203970 Treatment A
9 NOT HISPANIC OR LATINO BLACK OR AFRICAN AMERICAN 22.926762 Placebo
10 HISPANIC OR LATINO WHITE 45.226557 Treatment B
11 HISPANIC OR LATINO WHITE 32.112252 Placebo
Just copy above dataframe to clipboard and run df=pd.read_clipboard('\\s\\s+')
to get the dataframe into a variable.只需将上面的数据帧复制到剪贴板并运行
df=pd.read_clipboard('\\s\\s+')
将数据帧放入一个变量中。
out = (df.groupby(['TRT01A','ETHNIC', 'RACE'])['AGE']
.agg(mean=np.mean,
n='count',
deviation=np.std,
Q1=lambda x: np.percentile(x, 0.25)
)
.T.unstack().unstack(0)
)
I performed some aggregates in the above dataframe, and transposed, and successively unstacked them to get the following result:我在上面的数据帧中执行了一些聚合,并转置,并连续拆开它们以获得以下结果:
TRT01A Placebo Treatment A Treatment B
ETHNIC RACE
HISPANIC OR LATINO BLACK OR AFRICAN AMERICAN mean NaN NaN 42.910335
n NaN NaN 1.000000
deviation NaN NaN NaN
Q1 NaN NaN 42.910335
WHITE mean 35.724846 45.522245 45.226557
n 2.000000 1.000000 1.000000
deviation 5.108979 NaN NaN
Q1 32.130315 45.522245 45.226557
NOT HISPANIC OR LATINO BLACK OR AFRICAN AMERICAN mean 22.926762 NaN NaN
n 1.000000 NaN NaN
deviation NaN NaN NaN
Q1 22.926762 NaN NaN
WHITE mean 36.627881 38.203970 34.934976
n 3.000000 1.000000 2.000000
deviation 9.087438 NaN 4.398485
Q1 31.381246 38.203970 31.840329
Now, I want to unstack all the indices to get the following structure (ie inserting NaN
rows for all the index columns from first to second last, alongwith Level
column denoting the level of the index):现在,我想解开所有索引以获得以下结构(即为所有索引列从第一个到第二个最后插入
NaN
行,以及表示索引Level
列):
Placebo Treatment A Treatment B Level
HISPANIC OR LATINO NaN NaN NaN 0 <---
BLACK OR AFRICAN AMERICAN NaN NaN NaN 1 <---
mean NaN NaN 42.910335 2
n NaN NaN 1.000000 2
deviation NaN NaN NaN 2
Q1 NaN NaN 42.910335 2
WHITE NaN NaN NaN 1 <---
mean 35.724846 45.522245 45.226557 2
n 2.000000 1.000000 1.000000 2
deviation 5.108979 NaN NaN 2
Q1 32.130315 45.522245 45.226557 2
NOT HISPANIC OR LATINO NaN NaN NaN 0 <---
BLACK OR AFRICAN AMERICAN NaN NaN NaN 1 <---
mean 22.926762 NaN NaN 2
n 1.000000 NaN NaN 2
deviation NaN NaN NaN 2
Q1 22.926762 NaN NaN 2
WHITE NaN NaN NaN 1 <---
mean 36.627881 38.203970 34.934976 2
n 3.000000 1.000000 2.000000 2
deviation 9.087438 NaN 4.398485 2
Q1 31.381246 38.203970 31.840329 2
This question is identical to the previous question that I asked , but the problem is, there can be from 1 to 4 indices columns after aggregating, (ie aggregate may be applied on from 1 to 5 columns), and it's being difficult to use the same previous solution in this scenario.这个问题与我问的上一个问题相同,但问题是,聚合后可以有 1 到 4 个索引列,(即聚合可能应用于 1 到 5 列),并且很难使用在这种情况下与以前的解决方案相同。
Use custom function with DataFrame.append
first with custom DataFrame
filled by default NaN
values:首先将自定义函数与
DataFrame.append
一起使用,自定义DataFrame
由默认NaN
值填充:
def f(x):
names = pd.DataFrame(index=x.name, columns=x.columns).assign(Level=[0,1])
#print (names)
return names.append(x.reset_index(level=[0,1], drop=True).assign(Level=2))
out = out.groupby(level=[0,1], group_keys=False).apply(f)
And then remove duplicated 0
Levels:然后删除重复的
0
级:
out = out[~out.index.duplicated() | out['Level'].isin([1,2])]
print (out)
TRT01A Placebo Treatment A Treatment B Level
HISPANIC OR LATINO NaN NaN NaN 0
BLACK OR AFRICAN AMERICAN NaN NaN NaN 1
mean NaN NaN 42.910335 2
n NaN NaN 1.000000 2
deviation NaN NaN NaN 2
Q1 NaN NaN 42.910335 2
WHITE NaN NaN NaN 1
mean 35.724846 45.522245 45.226557 2
n 2.000000 1.000000 1.000000 2
deviation 5.108979 NaN NaN 2
Q1 32.130315 45.522245 45.226557 2
NOT HISPANIC OR LATINO NaN NaN NaN 0
BLACK OR AFRICAN AMERICAN NaN NaN NaN 1
mean 22.926762 NaN NaN 2
n 1.000000 NaN NaN 2
deviation NaN NaN NaN 2
Q1 22.926762 NaN NaN 2
WHITE NaN NaN NaN 1
mean 36.627881 38.203970 34.934976 2
n 3.000000 1.000000 2.000000 2
deviation 9.087438 NaN 4.398485 2
Q1 31.381246 38.203970 31.840329 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.