简体   繁体   中英

Unmelting a pandas dataframe with two columns

Suppose I have a dataframe

df = pd.DataFrame(np.random.normal(size = (10,3)), columns = list('abc'))

I melt the dataframe using pd.melt so that it looks like

variable    value
a           0.2
a           0.03
a          -0.99
a           0.86
a           1.74

Now, I would like to undo the action. Using pivot(columns = 'variable') almost works, but returns a lot of NULL values

        a   b   c
0     0.2   NAN NAN 
1     0.03  NAN NAN 
2    -0.99  NAN NAN 
3     0.86  NAN NAN 
4     1.74  NAN NAN 

How can I unmelt the dataframe so that it is as before?

A few ideas:
Assuming d1 is df.melt()

groupby + comprehension

pd.DataFrame({n: list(s) for n, s in d1.groupby('variable').value})

          a         b         c
0 -1.087129 -1.264522  1.147618
1  0.403731  0.416867 -0.367249
2 -0.920536  0.442650 -0.351229
3 -1.193876 -0.342237 -2.001431
4 -1.596659 -1.223354  1.323841
5  0.753658 -0.891211  0.541265
6  0.455577 -1.059572  1.017490
7 -0.153736  0.050007 -0.280192
8  1.189587  0.405647 -0.102023
9 -0.103273  0.200320 -0.630194

Option 2
pd.DataFrame.set_index

d1.set_index([d1.groupby('variable').cumcount(), 'variable']).value.unstack()

variable         a         b         c
0        -1.087129 -1.264522  1.147618
1         0.403731  0.416867 -0.367249
2        -0.920536  0.442650 -0.351229
3        -1.193876 -0.342237 -2.001431
4        -1.596659 -1.223354  1.323841
5         0.753658 -0.891211  0.541265
6         0.455577 -1.059572  1.017490
7        -0.153736  0.050007 -0.280192
8         1.189587  0.405647 -0.102023
9        -0.103273  0.200320 -0.630194

Use groupby , apply and unstack .

df.groupby('variable')['value']\
     .apply(lambda x: pd.Series(x.values)).unstack().T

variable         a         b         c
0         0.617037 -0.321493  0.747025
1         0.576410 -0.498173  0.185723
2        -1.563912  0.741198  1.439692
3        -1.305317  1.203608 -1.112820
4         1.287638  1.649580  0.404494
5         0.923544  0.988020 -1.918680
6         0.497406 -1.373345  0.074963
7         0.528444 -0.019914 -1.666261
8         0.260955  0.103575  0.190424
9         0.614411 -0.165363 -0.149514

Another method using the pivot and transform if you don't have nan value in the column ie

df1 = df.melt()
df1.pivot(columns='variable',values='value')
      .transform(lambda x: sorted(x,key=pd.isnull)).dropna()

Output:

variable         a         b         c
0         1.596937  0.431029  0.345441
1        -0.493352  0.135649 -1.559669
2         0.548048  0.667752  0.258160
3        -0.251368 -0.265106 -2.339768
4        -0.397010 -0.381193 -0.359447
5        -0.945300  0.520029  0.362570
6        -0.883771 -0.612628 -0.478003
7         0.833100 -0.387262 -1.195496
8        -1.310178 -0.748359  0.073014
9         0.753457  1.105500 -0.895841

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM