Pandas - Aggregating several columns into one

Question

I have a dataframe with several categorical columns, and I want to aggregate all these into a single categorical column, preferably using Pandas.

For an example, if I have two columns, named category1 (c1) and category2 (c2), both with data that range from 0 to 2, I want to aggregate them in some other column category (c), which can range from 0 to 5, representing all the possible categorical values combinations.

I would go from this:

d1 d2 c1 c2
1  1  NA 0
2  1  1  1
3  1  0  2
4  2  2  NA
5  1  NA NA
6  2  2  2
7  2  0  NA
8  2  0  2

To this:

I tried following this , but it didn't seem to work and threw some errors, namely ValueError: cannot reindex from a duplicate axis.

I appreciate in advance any help.

Answer 1

IIUC, you can use ngroup with groupby .

df['c'] = df.fillna(-1).groupby(['c1', 'c2']).ngroup()

The order might be arbitrary (ie not same as yours), but hopefully that's not important.

   d1  d2  c
0   1   1  1
1   2   1  4
2   3   1  3
3   4   2  5
4   5   1  0
5   6   2  6
6   7   2  2
7   8   2  3

Answer 2

We can chain wide_to_long + drop_duplicates

Newdf=pd.wide_to_long(df,['c'],i=['d1','d2'],j='drop').dropna().reset_index(level=[0,1]).drop_duplicates()
Newdf
Out[53]: 
      d1  d2    c
drop             
2      1   1  0.0
1      2   1  1.0
1      3   1  0.0
2      3   1  2.0
1      4   2  2.0
1      6   2  2.0
1      7   2  0.0
1      8   2  0.0
2      8   2  2.0

Pandas - Aggregating several columns into one

Question

1 answers

solution1
2 ACCPTED 2019-11-10 00:28:11

solution2
0 2019-11-10 00:25:31

Pandas - Aggregating several columns into one

Question

1 answers

solution1 2 ACCPTED 2019-11-10 00:28:11

solution2 0 2019-11-10 00:25:31

solution1
2 ACCPTED 2019-11-10 00:28:11

solution2
0 2019-11-10 00:25:31