简体   繁体   中英

Pandas unstack stack to fill missing features with nans

This question is a followup from this SO question: Pandas: add columns to multiindex for any depht of index levels

but in contrast, I do have following dataframe:

index = [['A', 'B', 'C', 'D'], ['a', 'b', 'a', 'b']]
cols = [['AC', 'AC', 'BC', 'DC', 'CC'], ['ac', 'aac', 'bc', 'ac', 'bc'], ['AAc', 'AAAAc', 'BBc', 'AAc', 'BBc']]
data = np.random.random((4, 5))
df = pd.DataFrame(data=data, index=index, columns=cols)
df.columns.names = ['col_name_0', 'col_name_1', 'col_name_2']

If I apply the solution from the previous post, I do get to many columns because the level 'col_name_2' also gets broadcasted to all groups of level 0.

The solution from the cited question was:

out = df.stack(level = 1).unstack().swaplevel(1, 2, axis = 1)

But this yields:

col_name_0        AC                                BC                CC                DC              
col_name_1       aac        ac              bc     aac  ac        bc aac  ac        bc aac        ac  bc
col_name_2     AAAAc AAc AAAAc       AAc AAAAc AAc BBc BBc       BBc BBc BBc       BBc AAc       AAc AAc
A a         0.908180 NaN   NaN  0.383903   NaN NaN NaN NaN  0.993260 NaN NaN  0.112402 NaN  0.196868 NaN
B b         0.901394 NaN   NaN  0.096745   NaN NaN NaN NaN  0.260379 NaN NaN  0.723057 NaN  0.194833 NaN

The level col_name_2 are physical units which belong to the corresponding features in the level col_name_1 . Accordincly column number 1 (0 index) does not make any sense. Same for column 3. Do you know how I could i) keep the units ii) just broadcast col_name_1 across all groups?

My current approach is to drop the level col_name_2 prior to stack and unstack but this would require an extra dictionary to map the units to the features which is not that bad but maybe there is a more elegant solution.

How about:

df.stack(level=(1,2)).unstack(level=(-1,-2))

Output:

col_name_0        AC                  BC      ...  CC              DC
col_name_2     AAAAc       AAc BBc AAAAc AAc  ... AAc       BBc AAAAc       AAc BBc
col_name_1       aac        ac  bc   aac  ac  ...  ac        bc   aac        ac  bc
A a         0.724763  0.688566 NaN   NaN NaN  ... NaN  0.854830   NaN  0.653829 NaN
B b         0.990737  0.689543 NaN   NaN NaN  ... NaN  0.486084   NaN  0.027718 NaN
C a         0.822234  0.122896 NaN   NaN NaN  ... NaN  0.580121   NaN  0.043333 NaN
D b         0.269341  0.503598 NaN   NaN NaN  ... NaN  0.447615   NaN  0.384507 NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM