I have a dataframe like the following:
import pandas as pd
d = {'one':[1,1,1,1,2, 2, 2, 2],
'two':['a','a','a','b', 'a','a','b','b'],
'letter':[' a','b','c','a', 'a', 'b', 'a', 'b']}
df = pd.DataFrame(d)
> one two letter
0 1 a a
1 1 a b
2 1 a c
3 1 b a
4 2 a a
5 2 a b
6 2 b a
7 2 b b
And I am trying to convert it to a dataframe like the following, where empty cells are filled with empty string '':
one two letter
1 a a
b
c
b a
2 a a
b
b a
b
When I perform groupby with all columns I get a series object that is basically exactly what I am looking for, but not a dataframe:
df.groupby(df.columns.tolist()).size()
1 a a 1
b 1
c 1
b a 1
2 a a 1
b 1
b a 1
b 1
How can I get the desired dataframe?
You can mask your columns where the value is not the same as the value below, then use where
to change it to a blank string:
df[['one','two']] = df[['one','two']].where(df[['one', 'two']].apply(lambda x: x != x.shift()), '')
>>> df
one two letter
0 1 a a
1 b
2 c
3 b a
4 2 a a
5 b
6 b a
7 b
some explanation :
Your mask looks like this:
>>> df[['one', 'two']].apply(lambda x: x != x.shift())
one two
0 True True
1 False False
2 False False
3 False True
4 True True
5 False False
6 False True
7 False False
All that where
is doing is finding the values where that is true, and replacing the rest with ''
The solution to the original problem is to find the dublicated cells in each of the first two columns and set them to empty:
df.loc[df.duplicated(subset=['one', 'two']), 'two'] = ''
df.loc[df.duplicated(subset=['one']), 'one'] = ''
However, the purpose of this transformation is unclear. Perhaps you are trying to solve a wrong problem.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.