Melting pandas data frame with multiple variable names and multiple value names

Question

How can I melt a pandas data frame using multiple variable names and values? I have the following data frame that changes its shape in a for loop. In one of the for loop iterations, it looks like this:

ID  Cat    Class_A   Class_B     Prob_A     Prob_B
1   Veg      1        2          0.9         0.1
2   Veg      1        2          0.8         0.2
3   Meat     1        2          0.6         0.4
4   Meat     1        2          0.3         0.7
5   Veg      1        2          0.2         0.8

I need to melt it in such a way that it looks like this:

ID  Cat    Class     Prob    
1   Veg      1       0.9       
1   Veg      2       0.1
2   Veg      1       0.8        
2   Veg      2       0.2
3   Meat     1       0.6         
3   Meat     2       0.4
4   Meat     1       0.3         
4   Meat     2       0.7
5   Veg      1       0.2         
5   Veg      2       0.8

During the for loop the data frame will contain different number of classes with their probabilities. That is why I am looking for a general approach that is applicable in all my for loop iterations. I saw this question and this but they were not helpful!

Answer 1

You need lreshape by dict for specify categories:

d = {'Class':['Class_A', 'Class_B'], 'Prob':['Prob_A','Prob_B']}
df = pd.lreshape(df,d)
print (df)
    Cat  ID  Class  Prob
0   Veg   1      1   0.9
1   Veg   2      1   0.8
2  Meat   3      1   0.6
3  Meat   4      1   0.3
4   Veg   5      1   0.2
5   Veg   1      2   0.1
6   Veg   2      2   0.2
7  Meat   3      2   0.4
8  Meat   4      2   0.7
9   Veg   5      2   0.8

More dynamic solution:

Class = [col for col in df.columns if col.startswith('Class')]
Prob = [col for col in df.columns if col.startswith('Prob')]
df = pd.lreshape(df, {'Class':Class, 'Prob':Prob})
print (df)
    Cat  ID  Class  Prob
0   Veg   1      1   0.9
1   Veg   2      1   0.8
2  Meat   3      1   0.6
3  Meat   4      1   0.3
4   Veg   5      1   0.2
5   Veg   1      2   0.1
6   Veg   2      2   0.2
7  Meat   3      2   0.4
8  Meat   4      2   0.7
9   Veg   5      2   0.8

EDIT:

lreshape is now undocumented, but is possible in future will by removed ( with pd.wide_to_long too ).

Possible solution is merging all 3 functions to one - maybe melt , but now it is not implementated. Maybe in some new version of pandas. Then my answer will be updated.

Answer 2

Or you can try this by using str.contain and pd.concat

DF1=df2.loc[:,df2.columns.str.contains('_A|Cat|ID')]
name=['ID','Cat','Class','Prob']
DF1.columns=name
DF2=df2.loc[:,df2.columns.str.contains('_B|Cat|ID')]
DF2.columns=name
pd.concat([DF1,DF2],axis=0)

Out[354]: 
   ID   Cat  Class  Prob
0   1   Veg      1   0.9
1   2   Veg      1   0.8
2   3  Meat      1   0.6
3   4  Meat      1   0.3
4   5   Veg      1   0.2
0   1   Veg      2   0.1
1   2   Veg      2   0.2
2   3  Meat      2   0.4
3   4  Meat      2   0.7
4   5   Veg      2   0.8

Answer 3

The top voted answer uses the undocumented lreshape which may at some point get deprecated because of its similarity to pd.wide_to_long which is documented and can use directly here. By default suffix matches only to numbers. You must change this to match characters (here I just used any character).

pd.wide_to_long(df, stubnames=['Class', 'Prob'], i=['ID', 'Cat'], j='DROPME', suffix='.')\
  .reset_index()\
  .drop('DROPME', axis=1)

   ID   Cat  Class  Prob
0   1   Veg      1   0.9
1   1   Veg      2   0.1
2   2   Veg      1   0.8
3   2   Veg      2   0.2
4   3  Meat      1   0.6
5   3  Meat      2   0.4
6   4  Meat      1   0.3
7   4  Meat      2   0.7
8   5   Veg      1   0.2
9   5   Veg      2   0.8

Answer 4

You could also use pd.melt .

# Make DataFrame
df = pd.DataFrame({'ID' :  [i for i in range(1,6)],
                   'Cat' : ['Veg']*2 + ['Meat']*2 + ['Veg'],
                   'Class_A' : [1]*5,
                   'Class_B' : [2]*5,
                   'Prob_A' : [0.9, 0.8, 0.6, 0.3, 0.2],
                   'Prob_B' : [0.1, 0.2, 0.4, 0.7, 0.8]})

# Make class dataframe and prob dataframe
df_class = df.loc[:, ['ID', 'Cat', 'Class_A', 'Class_B']]
df_prob = df.loc[:, ['ID', 'Cat', 'Prob_A', 'Prob_B']]

# Melt class dataframe and prob dataframe
df_class = df_class.melt(id_vars = ['ID',
                                    'Cat'],
                         value_vars = ['Class_A',
                                       'Class_B'],
                         value_name = 'Class')
df_prob = df_prob.melt(id_vars = ['ID',
                                  'Cat'],
                       value_vars = ['Prob_A',
                                     'Prob_B'],
                       value_name = 'Prob')

# Clean variable column so only 'A','B' is left in both dataframes
df_class.loc[:, 'variable'] = df_class.loc[:, 'variable'].str.partition('_')[2]
df_prob.loc[:, 'variable'] = df_prob.loc[:, 'variable'].str.partition('_')[2]

# Merge class dataframe with prob dataframe on 'ID', 'Cat', and 'variable';
# drop 'variable'; sort values by 'ID', 'Cat'
final = df_class.merge(df_prob,
                       how = 'inner',
                       on = ['ID',
                             'Cat',
                             'variable']).drop('variable', axis = 1).sort_values(by = ['ID',
                                                                                       'Cat'])

Answer 5

One option is pivot_longer from pyjanitor , which abstracts the process, and is efficient:

# pip install janitor
import janitor

df.pivot_longer(
    index = ['ID', 'Cat'], 
    names_to = '.value', 
    names_pattern = '([a-zA-Z]+)_*')

   ID   Cat  Class  Prob
0   1   Veg      1   0.9
1   2   Veg      1   0.8
2   3  Meat      1   0.6
3   4  Meat      1   0.3
4   5   Veg      1   0.2
5   1   Veg      2   0.1
6   2   Veg      2   0.2
7   3  Meat      2   0.4
8   4  Meat      2   0.7
9   5   Veg      2   0.8

The idea for this particular reshape is that whatever group in the regular expression is paired with the .value stays as the column header.

Melting pandas data frame with multiple variable names and multiple value names

Question

5 answers

solution1
13 ACCPTED 2017-08-13 16:18:37

solution2
4 2017-08-13 16:33:03

solution3
0 2017-08-18 18:05:49

solution4
0 2017-09-25 13:57:55

solution5
0 2022-04-23 14:24:02

Melting pandas data frame with multiple variable names and multiple value names

Question

5 answers

solution1 13 ACCPTED 2017-08-13 16:18:37

solution2 4 2017-08-13 16:33:03

solution3 0 2017-08-18 18:05:49

solution4 0 2017-09-25 13:57:55

solution5 0 2022-04-23 14:24:02

solution1
13 ACCPTED 2017-08-13 16:18:37

solution2
4 2017-08-13 16:33:03

solution3
0 2017-08-18 18:05:49

solution4
0 2017-09-25 13:57:55

solution5
0 2022-04-23 14:24:02