Reshape Pandas Dataframe with multiple column groups

Question

I currently have a wide dataframe that looks like this:

Index ID1 ID2 Foc_A Foc_B Foc_C Sat_A Sat_B Sat_C
   0   r  1   10    15    17    100   105   107
   1   r  2   20    25    27    110   115   117
   2   s  1   30    35    37    120   125   127
   3   s  2   40    45    47    130   135   137

Each entry has multiple identifier columns (ID1 and ID2). I then have two separate categories of measurements (Foc and Sat) each of which contains multiple identifiers (A, B, C)(the category identifiers are shared between categories). I'm eventually needing to plot it in a facet_grid with x and y as each category of measurement and separated by category identifier, so I'm trying to reshape it so that it looks like this:

Index ID1 ID2 Ch  Foc  Sat 
   0   r  1   A   10   100 
   1   r  1   B   15    105
   2   r  1   C   17    107
   3   r  2   A   20    110 
   4   r  2   B   25    115 
   5   r  2   C   27    117 
   6   s  1   A   30    120 
   7   s  1   B   35    125
   8   s  1   C   37    127

I've been trying.melt, .pivot, and.stack but not understanding what I'm doing well enough to make headway.

Answer 1

You are thinking in the right way. You can do:

# melt the dataframe
d1 = df.set_index(['Index', 'ID1', 'ID2']).stack().reset_index()

# create separate column
d1[['flag', 'Ch']] = d1['level_3'].str.split('_', expand=True)
d1 = d1.drop('level_3', 1)
d1 = d1.rename(columns = {0: 'Foc'})

# expand the dataframe to wide
d2 = pd.pivot_table(d1, index=['Index', 'ID1', 'ID2', 'Ch'], columns=['flag']).reset_index()

# fix column names
d2.columns = ['Index', 'ID1', 'ID2', 'Ch', 'Foc', 'Sat']

print(d2.head())

   Index ID1  ID2 Ch  Foc  Sat
0      0   r    1  A   10  100
1      0   r    1  B   15  105
2      0   r    1  C   17  107
3      1   r    2  A   20  110
4      1   r    2  B   25  115

Answer 2

I'd set ID columns to the index, split and expand the columns on the '_' character, then stack the dataframe:

from io import StringIO
import pandas

datafile = StringIO("""\
Index ID1 ID2 Foc_A Foc_B Foc_C Sat_A Sat_B Sat_C
   0   r  1   10    15    17    100   105   107
   1   r  2   20    25    27    110   115   117
   2   s  1   30    35    37    120   125   127
   3   s  2   40    45    47    130   135   137
""")

(
    pandas.read_csv(datafile, sep='\s+')
        .set_index(['ID1', 'ID2'])
        .drop(columns='Index')
        .pipe(lambda df:
             df.set_axis(
                 df.columns.str.split('_', expand=True),
                 axis="columns"
            )
        )
        .rename_axis([None, 'Ch'], axis='columns')
        .stack(level='Ch')
        .reset_index()
)

And that give me:

   ID1  ID2 Ch  Foc  Sat
0    r    1  A   10  100
1    r    1  B   15  105
2    r    1  C   17  107
3    r    2  A   20  110
4    r    2  B   25  115
5    r    2  C   27  117
6    s    1  A   30  120
7    s    1  B   35  125
8    s    1  C   37  127
9    s    2  A   40  130
10   s    2  B   45  135
11   s    2  C   47  137

Answer 3

Let us do wide_to_long

df = pd.wide_to_long(df,['Foc','Sat'],i=['ID1','ID2'],j='Ch',sep='_',suffix='\w+').reset_index()
Out[168]: 
   ID1  ID2 Ch  Foc  Sat
0    r    1  A   10  100
1    r    1  B   15  105
2    r    1  C   17  107
3    r    2  A   20  110
4    r    2  B   25  115
5    r    2  C   27  117
6    s    1  A   30  120
7    s    1  B   35  125
8    s    1  C   37  127
9    s    2  A   40  130
10   s    2  B   45  135
11   s    2  C   47  137

Answer 4

you could also achieve this with .melt , .groupby and np.where :

df = pd.melt(df, id_vars=['ID1','ID2','Foc_A', 'Foc_B', 'Foc_C'], var_name='Ch', value_name='Sat') \
    .groupby(['ID1','ID2','Ch']).agg({'Foc_A':'max','Foc_B':'max', 'Foc_C':'max','Sat':'max'}).reset_index()
df['Foc'] = np.where((df['Ch'] == 'Sat_A'), df['Foc_A'], '')
df['Foc'] = np.where((df['Ch'] == 'Sat_B'), df['Foc_B'], df['Foc'])
df['Foc'] = np.where((df['Ch'] == 'Sat_C'), df['Foc_C'], df['Foc'])
df['Ch'] = df['Ch'].str.replace('Sat_', '')
df = df.drop(['Foc_A', 'Foc_B', 'Foc_C'], axis=1)
df

output:

    ID1 ID2 Ch      Sat Foc
0   r   1   A       100 10
1   r   1   B       105 15
2   r   1   C       107 17
3   r   2   A       110 20
4   r   2   B       115 25
5   r   2   C       117 27
6   s   1   A       120 30
7   s   1   B       125 35
8   s   1   C       127 37
9   s   2   A       130 40
10  s   2   B       135 45
11  s   2   C       137 47

Answer 5

Creating a multi_index then using stack

df = df.set_index(['ID1','ID2'])

df.columns = df.columns.str.split('_',expand=True)

df1 = df.stack(1).reset_index().rename(columns={'level_2' : 'Ch'})

   ID1  ID2 Ch  Foc  Sat
0    r    1  A   10  100
1    r    1  B   15  105
2    r    1  C   17  107
3    r    2  A   20  110
4    r    2  B   25  115
5    r    2  C   27  117
6    s    1  A   30  120
7    s    1  B   35  125
8    s    1  C   37  127
9    s    2  A   40  130
10   s    2  B   45  135
11   s    2  C   47  137

Reshape Pandas Dataframe with multiple column groups

Question

5 answers

solution1
2 ACCPTED 2020-07-13 20:40:24

solution2
1 2020-07-13 20:50:01

solution3
1 2020-07-13 21:11:13

solution4
0 2020-07-13 20:56:48

solution5
0 2020-07-13 21:16:05

Reshape Pandas Dataframe with multiple column groups

Question

5 answers

solution1 2 ACCPTED 2020-07-13 20:40:24

solution2 1 2020-07-13 20:50:01

solution3 1 2020-07-13 21:11:13

solution4 0 2020-07-13 20:56:48

solution5 0 2020-07-13 21:16:05

solution1
2 ACCPTED 2020-07-13 20:40:24

solution2
1 2020-07-13 20:50:01

solution3
1 2020-07-13 21:11:13

solution4
0 2020-07-13 20:56:48

solution5
0 2020-07-13 21:16:05