简体   繁体   中英

Vertically stack multiple dataframes in pandas using different keys

I have a dataframe in Pandas like below -

import pandas as pd
data = {'Category': ['cat2','cat1','cat2','cat1','cat2','cat1','cat2','cat1','cat1','cat1','cat2'],
        'values': [1,2,3,1,2,3,1,2,3,5,1]}
my_data = pd.DataFrame(data)

I get the minimum, maximum and average values per category values and I also rename the column names as shown below -

# Get the minimum value for each column
min_agg = my_data.groupby('Category').min().reset_index()
min_agg.columns = [str(col) + '_min' for col in min_agg.columns]

   category_min  values_min
0   cat1            1
1   cat2            1

# Get the minimum value for each column
max_agg = my_data.groupby('Category').max().reset_index()
max_agg.columns = [str(col) + '_max' for col in max_agg.columns]


   category_max  values_max
0   cat1            5
1   cat2            3

# Get average value for each column
avg_agg = my_data.groupby('Category').mean().reset_index()
avg_agg.columns = [str(col) + '_avg' for col in avg_agg.columns]


   category_avg  values_avg
0   cat1           2.666667
1   cat2           1.600000


# count number of zeros
zeros_agg = my_data.groupby('Category')['values'].apply(lambda column: (column == 0).sum()).reset_index()
zeros_agg.columns = [str(col) + '_zeros' for col in zeros_agg.columns]



   Category_zeros  values_zeros
0   cat1                0
1   cat2                0

Now, I want to vertically stack these 4 dataframes to get a final dataframe that has 8 records, 2 each per dataframe which looks like this -

category            values
cat1_min            1
cat2_min            1
cat1_max            5
cat2_max            3
cat1_avg           2.666667
cat2_avg           1.600000
cat1_zeros          0
cat2_zeros          0

The first column in the output shows what aggregation is applied on which category and second column shows the corresponding value.

How do I do this using pandas?

I tried

vertical_stack = pd.concat([min_agg, max_agg, avg_agg,zeros_agg], axis=0 , keys=['Category_min','Category_max','Category_avg','Category_zeros'])

but it did not give me expected output.

The aggregation shown here is only on 1 column but I have a bigger dataset and I am computing this aggregations on many columns.

You can pivot on category, with your desired aggregation functions, then stack the result:

r = df.pivot_table(
    columns='Category', values='values', aggfunc=['min', 'max', 'mean']) 
r.columns = r.columns.map('_'.join)  

r.T                                                                                                                                                  

             values
min_cat1   1.000000
min_cat2   1.000000
max_cat1   5.000000
max_cat2   3.000000
mean_cat1  2.666667
mean_cat2  1.600000

This is similar to

r = df.groupby('Category')['values'].agg(['min', 'max', 'mean']).stack() 
r.index = r.index.map('_'.join)

r
cat1_min     1.000000
cat1_max     5.000000
cat1_mean    2.666667
cat2_min     1.000000
cat2_max     3.000000
cat2_mean    1.600000
dtype: float64

I will take advantage of describe

s=df.groupby('Category')['values'].describe()[['mean','max','min']].stack()
s.index=s.index.map('_'.join)
s
cat1_mean    2.666667
cat1_max     5.000000
cat1_min     1.000000
cat2_mean    1.600000
cat2_max     3.000000
cat2_min     1.000000
dtype: float64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM