简体   繁体   中英

How to aggregate columns in pandas

I have a data frame with number of columns I want to group them under two main groups A and B while preserving the old columns names as dictionary as follow

index userid  col1  col2  col3  col4  col5 col6 col7
0      1        6    3    Nora  100    11   22    44

the desired data frame is as follow

index userid                 A                                             B 
0      1    {"col1":6, "col2":3, "col3":"Nora","col4":100}    {"col5":11, "col6":22, "col7":44}

You can try something like this:

d = {'col1': 'A',
     'col2': 'A',
     'col3': 'A',
     'col4': 'A',
     'col5': 'B',
     'col6': 'B', 
     'col7': 'B'}

df.groupby(d, axis=1).apply(pd.DataFrame.to_dict, orient='series').to_frame().T

Output:

                                                   A                                           B
0  {'col1': [6], 'col2': [3], 'col3': ['Nora'], '...  {'col5': [11], 'col6': [22], 'col7': [44]}

To match your desired dataframe exactly:

>>> import pandas as pd

# recreating your data
>>> df = pd.DataFrame.from_dict({'index': [0], 'userid': [1], 'col1': [6], 'col2': [3], 'col3': ['Nora'], 'col4': [100], 'col5': [11], 'col6': [22], 'col7': [44]})

# copy of unchanged columns
>>> df_new = df[['index', 'userid']].copy()

# grouping columns together
>>> df_new['A'] = df[['col1', 'col2', 'col3', 'col4']].copy().to_dict(orient='records')
>>> df_new['B'] = df[['col5', 'col6', 'col7']].copy().to_dict(orient='records')

>>> df_new
   index  userid                                                    A                                     B
0      0       1  {'col1': 6, 'col2': 3, 'col3': 'Nora', 'col4': 100}  {'col5': 11, 'col6': 22, 'col7': 44}

Working with the original dataframe.

import pandas as pd

df1 = pd.DataFrame({'index':[0], 'userid':[1],   
                    'col1': [6],   'col2': [3],  'col3': ['Nora']  ,'col4':[100],  
                    'col5':[11],  'col6': [22], 'col7':[44]})

df1['A'] = df1[['col1', 'col2', 'col3', 'col4']].to_dict(orient='records')
df1['B'] = df1[['col5', 'col6', 'col7']].to_dict(orient='records')

df1.drop(df1.columns[range(2, 9)], axis=1, inplace=True)

print(df1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM