Unmelt only part of a column from pandas dataframe

Question

I have the following example dataframe:

df = pd.DataFrame(data = {'RecordID' : [1,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5], 'DisplayLabel' : ['Source','Test','Value 1','Value 2','Value3','Source','Test','Value 1','Value 2','Source','Test','Value 1','Value 2','Source','Test','Value 1','Value 2','Source','Test','Value 1','Value 2'],
'Value' : ['Web','Logic','S','I','Complete','Person','Voice','>20','P','Mail','OCR','A','I','Dictation','Understandable','S','I','Web','Logic','R','S']})

which creates this dataframe:

+-------+----------+---------------+----------------+
| Index | RecordID | Display Label |     Value      |
+-------+----------+---------------+----------------+
|     0 |        1 | Source        | Web            |
|     1 |        1 | Test          | Logic          |
|     2 |        1 | Value 1       | S              |
|     3 |        1 | Value 2       | I              |
|     4 |        1 | Value 3       | Complete       |
|     5 |        2 | Source        | Person         |
|     6 |        2 | Test          | Voice          |
|     7 |        2 | Value 1       | >20            |
|     8 |        2 | Value 2       | P              |
|     9 |        3 | Source        | Mail           |
|    10 |        3 | Test          | OCR            |
|    11 |        3 | Value 1       | A              |
|    12 |        3 | Value 2       | I              |
|    13 |        4 | Source        | Dictation      |
|    14 |        4 | Test          | Understandable |
|    15 |        4 | Value 1       | S              |
|    16 |        4 | Value 2       | I              |
|    17 |        5 | Source        | Web            |
|    18 |        5 | Test          | Logic          |
|    19 |        5 | Value 1       | R              |
|    20 |        5 | Value 2       | S              |
+-------+----------+---------------+----------------+

I am trying to "unmelt" though not exactly the source and test columns into new dataframe Columns such that it will look like this:

+-------+----------+-----------+----------------+---------------+----------+
| Index | RecordID |  Source   |      Test      | Result        |  Value   |
+-------+----------+-----------+----------------+---------------+----------+
|     0 |        1 | Web       | Logic          | Value 1       | S        |
|     1 |        1 | Web       | Logic          | Value 2       | I        |
|     2 |        1 | Web       | Logic          | Value 3       | Complete |
|     3 |        2 | Person    | Voice          | Value 1       | >20      |
|     4 |        2 | Person    | Voice          | Value 2       | P        |
|     5 |        3 | Mail      | OCR            | Value 1       | A        |
|     6 |        3 | Mail      | OCR            | Value 2       | I        |
|     7 |        4 | Dictation | Understandable | Value 1       | S        |
|     8 |        4 | Dictation | Understandable | Value 2       | I        |
|     9 |        5 | Web       | Logic          | Value 1       | R        |
|    10 |        5 | Web       | Logic          | Value 2       | S        |
+-------+----------+-----------+----------------+---------------+----------+

It's my understanding that pivot and melt will do the entire DisplayLabel column and not just some of the values.

Any help would be greatly appreciated as I have read the Pandas Melt and the Pandas Pivot as well as some references on stackoverflow and I can't seem to figure out a way to do this quickly.

Thanks!

Answer 1

`set_index` , `unstack` , then `melt`

df.set_index(['RecordID', 'DisplayLabel']).Value.unstack().reset_index() \
  .melt(['RecordID', 'Source', 'Test'], var_name='Result', value_name='Value') \
  .sort_values('RecordID').dropna(subset=['Value'])

    RecordID     Source            Test   Result     Value
0          1        Web           Logic  Value 1         S
5          1        Web           Logic  Value 2         I
10         1        Web           Logic  Value 3  Complete
1          2     Person           Voice  Value 1       >20
6          2     Person           Voice  Value 2         P
2          3       Mail             OCR  Value 1         A
7          3       Mail             OCR  Value 2         I
3          4  Dictation  Understandable  Value 1         S
8          4  Dictation  Understandable  Value 2         I
4          5        Web           Logic  Value 1         R
9          5        Web           Logic  Value 2         S

Custom function for `groupby`

def f(t):
    name, df = t
    d = dict(zip(df['DisplayLabel'], df['Value']))
    source = d.pop('Source')
    test = d.pop('Test')
    result, value = zip(*d.items())

    return pd.DataFrame(
        dict(RecordID=name, Source=source, Test=test, Result=result, Value=value)
    )

pd.concat(map(f, df.groupby('RecordID')))

   RecordID     Source            Test   Result     Value
0         1        Web           Logic  Value 1         S
1         1        Web           Logic  Value 2         I
2         1        Web           Logic  Value 3  Complete
0         2     Person           Voice  Value 1       >20
1         2     Person           Voice  Value 2         P
0         3       Mail             OCR  Value 1         A
1         3       Mail             OCR  Value 2         I
0         4  Dictation  Understandable  Value 1         S
1         4  Dictation  Understandable  Value 2         I
0         5        Web           Logic  Value 1         R
1         5        Web           Logic  Value 2         S

Setup

df = pd.DataFrame(data={
    'RecordID': [1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5],
    'DisplayLabel': [
        'Source', 'Test', 'Value 1', 'Value 2', 'Value 3',
        'Source', 'Test', 'Value 1', 'Value 2',
        'Source', 'Test', 'Value 1', 'Value 2',
        'Source', 'Test', 'Value 1', 'Value 2',
        'Source', 'Test', 'Value 1', 'Value 2'
    ],
    'Value': [
        'Web', 'Logic', 'S', 'I', 'Complete',
        'Person', 'Voice', '>20', 'P',
        'Mail', 'OCR', 'A', 'I',
        'Dictation', 'Understandable', 'S', 'I',
        'Web', 'Logic', 'R', 'S'
    ]
})

Answer 2

We can achieve your result by applying logic and pivotting, we split your data by checking if DisplayLabel contains Value and then we join them back together:

mask = df['DisplayLabel'].str.contains('Value')
df2 = df[~mask].pivot(index='RecordID', columns='DisplayLabel', values='Value')

dfpiv = (
    df[mask].rename(columns={'DisplayLabel':'Result'})
            .set_index('RecordID')
            .join(df2)
            .reset_index()
)

    RecordID   Result     Value     Source            Test
0          1  Value 1         S        Web           Logic
1          1  Value 2         I        Web           Logic
2          1   Value3  Complete        Web           Logic
3          2  Value 1       >20     Person           Voice
4          2  Value 2         P     Person           Voice
5          3  Value 1         A       Mail             OCR
6          3  Value 2         I       Mail             OCR
7          4  Value 1         S  Dictation  Understandable
8          4  Value 2         I  Dictation  Understandable
9          5  Value 1         R        Web           Logic
10         5  Value 2         S        Web           Logic

If you want the exact column order as your example, use DataFrame.reindex :

dfpiv.reindex(columns=['RecordID', 'Source', 'Test', 'Result', 'Value'])


    RecordID     Source            Test   Result     Value
0          1        Web           Logic  Value 1         S
1          1        Web           Logic  Value 2         I
2          1        Web           Logic   Value3  Complete
3          2     Person           Voice  Value 1       >20
4          2     Person           Voice  Value 2         P
5          3       Mail             OCR  Value 1         A
6          3       Mail             OCR  Value 2         I
7          4  Dictation  Understandable  Value 1         S
8          4  Dictation  Understandable  Value 2         I
9          5        Web           Logic  Value 1         R
10         5        Web           Logic  Value 2         S

In detail - step by step:

# mask all rows where "Value" is in column DisplayLabel
mask = df['DisplayLabel'].str.contains('Value')

0     False
1     False
2      True
3      True
4      True
5     False
6     False
7      True
8      True
9     False
10    False
11     True
12     True
13    False
14    False
15     True
16     True
17    False
18    False
19     True
20     True
Name: DisplayLabel, dtype: bool

# select all rows which do NOT have "Value" in DisplayLabel
df[~mask]

    RecordID DisplayLabel           Value
0          1       Source             Web
1          1         Test           Logic
5          2       Source          Person
6          2         Test           Voice
9          3       Source            Mail
10         3         Test             OCR
13         4       Source       Dictation
14         4         Test  Understandable
17         5       Source             Web
18         5         Test           Logic

# pivot the values in DisplayLabel to columns
df2 = df[~mask].pivot(index='RecordID', columns='DisplayLabel', values='Value')

DisplayLabel     Source            Test
RecordID                               
1                   Web           Logic
2                Person           Voice
3                  Mail             OCR
4             Dictation  Understandable
5                   Web           Logic

df[mask].rename(columns={'DisplayLabel':'Result'}) # rename the column DisplayLabel to Result
            .set_index('RecordID')                 # set RecordId as index so we can join df2 
            .join(df2)                             # join df2 back to our dataframe based RecordId
            .reset_index()                         # reset index so we get RecordId back as column

    RecordID   Result     Value     Source            Test
0          1  Value 1         S        Web           Logic
1          1  Value 2         I        Web           Logic
2          1   Value3  Complete        Web           Logic
3          2  Value 1       >20     Person           Voice
4          2  Value 2         P     Person           Voice
5          3  Value 1         A       Mail             OCR
6          3  Value 2         I       Mail             OCR
7          4  Value 1         S  Dictation  Understandable
8          4  Value 2         I  Dictation  Understandable
9          5  Value 1         R        Web           Logic
10         5  Value 2         S        Web           Logic

Answer 3

I tried a different approach of first pivot ing using unstack and then partially converting wide_to_long (Sorry if it's not efficient but this seems to get the desired output)

# first converting all long to wide
df2 = df.set_index(['RecordID','DisplayLabel']).unstack()
# flattening the unstacked columns
df2.columns = df2.columns.to_series().str.join('_')
df2.columns = df2.columns.str.replace('Value_','',regex=True) #just removing the junk in the column name
df2 = df2.reset_index() #resetting index to access RecordID

df2 = (pd.melt(df2,id_vars=['RecordID',"Source","Test"],var_name='Result', value_name='Value')
.sort_values(['RecordID',"Source","Test"])
.dropna()
.reset_index())

index   RecordID    Source  Test    Result  Value
0   0   1   Web Logic   Value 1 S
1   5   1   Web Logic   Value 2 I
2   10  1   Web Logic   Value 3 Complete
3   1   2   Person  Voice   Value 1 >20
4   6   2   Person  Voice   Value 2 P
5   2   3   Mail    OCR Value 1 A
6   7   3   Mail    OCR Value 2 I
7   3   4   Dictation   Understandable  Value 1 S
8   8   4   Dictation   Understandable  Value 2 I
9   4   5   Web Logic   Value 1 R
10  9   5   Web Logic   Value 2 S

Unmelt only part of a column from pandas dataframe

Question

3 answers

solution1
7 2020-01-21 18:54:37

`set_index` , `unstack` , then `melt`

Custom function for `groupby`

Setup

solution2
7 ACCPTED 2020-01-21 18:55:15

In detail - step by step:

solution3
0 2020-01-21 19:22:03

Unmelt only part of a column from pandas dataframe

Question

3 answers

solution1 7 2020-01-21 18:54:37

set_index , unstack , then melt

Custom function for groupby

Setup

solution2 7 ACCPTED 2020-01-21 18:55:15

In detail - step by step:

solution3 0 2020-01-21 19:22:03

solution1
7 2020-01-21 18:54:37

`set_index` , `unstack` , then `melt`

Custom function for `groupby`

solution2
7 ACCPTED 2020-01-21 18:55:15

solution3
0 2020-01-21 19:22:03