简体   繁体   中英

Pandas: How to (cleanly) unpivot two columns with same category?

I'm trying to unpivot two columns inside a pandas dataframe. The transformation I seek would be the inverse of this question .

We start with a dataset that looks like this:

import pandas as pd
import numpy as np
df_orig = pd.DataFrame(data=np.random.randint(255, size=(4,5)),
                       columns=['accuracy','time_a','time_b','memory_a', 'memory_b'])
df_orig

   accuracy  time_a  time_b  memory_a  memory_b
0         6     118     170       102       239
1       241       9     166       159       162
2       164      70      76       228       121
3       228     121     135       128        92

I wish to unpivot both the mwmory and time columns, obtaining this dataset in result:

df

    accuracy  memory category  time
0          6     102        a   118
1        241     159        a     9
2        164     228        a    70
3        228     128        a   121
12         6     239        b   170
13       241     162        b   166
14       164     121        b    76
15       228      92        b   135

So far I have managed to get my desired output using df.melt() twice plus some extra commands:

df = df_orig.copy()

# Unpivot memory columns
df = df.melt(id_vars=['accuracy','time_a', 'time_b'],
             value_vars=['memory_a', 'memory_b'],
             value_name='memory',
             var_name='mem_cat')

# Unpivot time columns
df = df.melt(id_vars=['accuracy','memory', 'mem_cat'],
             value_vars=['time_a', 'time_b'],
             value_name='time',
             var_name='time_cat')

# Keep only the 'a'/'b' as categories
df.mem_cat = df.mem_cat.str[-1]
df.time_cat = df.time_cat.str[-1]

# Keeping only the colums whose categories match (DIRTY!)
df = df[df.mem_cat==df.time_cat]

# Removing the duplicated category column.
df = df.drop(columns='time_cat').rename(columns={"mem_cat":'category'})

Given how easy it was to solve the inverse question , I believe my code is way too complex. Can anyone do it better?

Use wide_to_long :

np.random.seed(123)
df_orig = pd.DataFrame(data=np.random.randint(255, size=(4,5)),
                       columns=['accuracy','time_a','time_b','memory_a', 'memory_b'])


df = (pd.wide_to_long(df_orig.reset_index(), 
                     stubnames=['time','memory'],
                     i='index',
                     j='category',
                     sep='_',
                     suffix='\w+')
          .reset_index(level=1)
          .reset_index(drop=True)
          .rename_axis(None))
print (df)
  category  accuracy  time  memory
0        a       254   109      66
1        a        98   230      83
2        a       123    57     225
3        a       113   126      73
4        b       254   126     220
5        b        98    17     106
6        b       123   214      96
7        b       113    47      32

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM