Pandas groupby and replace duplicates with empty string

Question

I have a dataframe like the following:

import pandas as pd

d = {'one':[1,1,1,1,2, 2, 2, 2],
     'two':['a','a','a','b', 'a','a','b','b'],
     'letter':[' a','b','c','a', 'a', 'b', 'a', 'b']}

df = pd.DataFrame(d)
>    one two letter
0    1   a      a
1    1   a      b
2    1   a      c
3    1   b      a
4    2   a      a
5    2   a      b
6    2   b      a
7    2   b      b

And I am trying to convert it to a dataframe like the following, where empty cells are filled with empty string '':

one  two  letter
1    a    a        
          b        
          c         
     b    a         
2    a    a         
          b         
     b    a         
          b

When I perform groupby with all columns I get a series object that is basically exactly what I am looking for, but not a dataframe:

df.groupby(df.columns.tolist()).size()   
1    a    a         1
          b         1
          c         1
     b    a         1
2    a    a         1
          b         1
     b    a         1
          b         1

How can I get the desired dataframe?

Answer 1

You can mask your columns where the value is not the same as the value below, then use where to change it to a blank string:

df[['one','two']] = df[['one','two']].where(df[['one', 'two']].apply(lambda x: x != x.shift()), '')

>>> df
  one two letter
0   1   a      a
1              b
2              c
3       b      a
4   2   a      a
5              b
6       b      a
7              b

some explanation :

Your mask looks like this:

>>> df[['one', 'two']].apply(lambda x: x != x.shift())
     one    two
0   True   True
1  False  False
2  False  False
3  False   True
4   True   True
5  False  False
6  False   True
7  False  False

All that where is doing is finding the values where that is true, and replacing the rest with ''

Answer 2

The solution to the original problem is to find the dublicated cells in each of the first two columns and set them to empty:

df.loc[df.duplicated(subset=['one', 'two']), 'two'] = ''
df.loc[df.duplicated(subset=['one']),        'one'] = ''

However, the purpose of this transformation is unclear. Perhaps you are trying to solve a wrong problem.

Pandas groupby and replace duplicates with empty string

Question

2 answers

solution1
1 ACCPTED 2018-08-02 05:29:24

solution2
0 2018-08-02 05:34:27

Pandas groupby and replace duplicates with empty string

Question

2 answers

solution1 1 ACCPTED 2018-08-02 05:29:24

solution2 0 2018-08-02 05:34:27

solution1
1 ACCPTED 2018-08-02 05:29:24

solution2
0 2018-08-02 05:34:27