简体   繁体   中英

Pandas - Aggregating several columns into one

I have a dataframe with several categorical columns, and I want to aggregate all these into a single categorical column, preferably using Pandas.

For an example, if I have two columns, named category1 (c1) and category2 (c2), both with data that range from 0 to 2, I want to aggregate them in some other column category (c), which can range from 0 to 5, representing all the possible categorical values combinations.

I would go from this:

d1 d2 c1 c2
1  1  NA 0
2  1  1  1
3  1  0  2
4  2  2  NA
5  1  NA NA
6  2  2  2
7  2  0  NA
8  2  0  2

To this:

d1 d2 c
1  1  0
2  1  1
3  1  2
4  2  3
5  1  4
6  2  5 
7  2  6
8  2  2

I tried following this , but it didn't seem to work and threw some errors, namely ValueError: cannot reindex from a duplicate axis.

I appreciate in advance any help.

IIUC, you can use ngroup with groupby .

df['c'] = df.fillna(-1).groupby(['c1', 'c2']).ngroup()

The order might be arbitrary (ie not same as yours), but hopefully that's not important.


   d1  d2  c
0   1   1  1
1   2   1  4
2   3   1  3
3   4   2  5
4   5   1  0
5   6   2  6
6   7   2  2
7   8   2  3

We can chain wide_to_long + drop_duplicates

Newdf=pd.wide_to_long(df,['c'],i=['d1','d2'],j='drop').dropna().reset_index(level=[0,1]).drop_duplicates()
Newdf
Out[53]: 
      d1  d2    c
drop             
2      1   1  0.0
1      2   1  1.0
1      3   1  0.0
2      3   1  2.0
1      4   2  2.0
1      6   2  2.0
1      7   2  0.0
1      8   2  0.0
2      8   2  2.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM