How to aggregate, combining dataframes, with pandas groupby

Question

I have a dataframe df and a column df['table'] such that each item in df['table'] is another dataframe with the same headers/number of columns. I was wondering if there's a way to do a groupby like this:

Original dataframe:

name    table
Bob     Pandas df1
Joe     Pandas df2
Bob     Pandas df3
Bob     Pandas df4
Emily   Pandas df5

After groupby:

name    table
Bob     Pandas df containing the appended df1, df3, and df4
Joe     Pandas df2
Emily   Pandas df5

I found this code snippet to do a groupby and lambda for strings in a dataframe, but haven't been able to figure out how to append entire dataframes in a groupby .

df['table'] = df.groupby(['name'])['table'].transform(lambda x : ' '.join(x))

I've also tried df['table'] = df.groupby(['name'])['HTML'].apply(list) , but that gives me a df['table'] of all NaN .

Thanks for your help!!

Answer 1

Given 3 dataframes

import pandas as pd

dfa = pd.DataFrame({'a': [1, 2, 3]})
dfb = pd.DataFrame({'a': ['a', 'b', 'c']})
dfc = pd.DataFrame({'a': ['pie', 'steak', 'milk']})

Given another dataframe, with dataframes in the columns

df = pd.DataFrame({'name': ['Bob', 'Joe', 'Bob', 'Bob', 'Emily'], 'table': [dfa, dfa, dfb, dfc, dfb]})

# print the type for the first value in the table column, to confirm it's a dataframe
print(type(df.loc[0, 'table']))
[out]:
<class 'pandas.core.frame.DataFrame'>

Each group of dataframes, can be combined into a single dataframe, by using .groupby and aggregating a list for each group, and combining the dataframes in the list , with pd.concat

# if there is only one column, or if there are multiple columns of dataframes to aggregate
dfg = df.groupby('name').agg(lambda x: pd.concat(list(x)).reset_index(drop=True))

# display(dfg.loc['Bob', 'table'])
       a
0      1
1      2
2      3
3      a
4      b
5      c
6    pie
7  steak
8   milk

# to specify a single column, or specify multiple columns, from many columns
dfg = df.groupby('name')[['table']].agg(lambda x: pd.concat(list(x)).reset_index(drop=True))

Not a duplicate

Originally, I had marked this question as a duplicate of How to group dataframe rows into list in pandas groupby , thinking the dataframes could be aggregated into a list , and then combined with pd.concat .

df.groupby('name')['table'].apply(list)
df.groupby('name').agg(list)
df.groupby('name')['table'].agg(list)
df.groupby('name').agg({'table': list})
df.groupby('name').agg(lambda x: list(x))

However, these all result in a StopIteration error, when there are dataframes to aggregate.

Answer 2

Here let's create a dataframe with dataframes as columns:

First, I start with three dataframes:

import pandas as pd

#creating dataframes that we will assign to Bob and Joe, notice b's and j':

df1 = pd.DataFrame({'var1':[12, 34, -4, None], 'letter':['b1', 'b2', 'b3', 'b4']})
df2 = pd.DataFrame({'var1':[1, 23, 44, 0], 'letter':['j1', 'j2', 'j3', 'j4']})
df3 = pd.DataFrame({'var1':[22, -3, 7, 78], 'letter':['b5', 'b6', 'b7', 'b8']})

#lets make a list of dictionaries:
list_of_dfs = [
    {'name':'Bob' ,'table':df1},
    {'name':'Joe' ,'table':df2},
    {'name':'Bob' ,'table':df3}
]

#constuct the main dataframe:
original_df = pd.DataFrame(list_of_dfs)
print(original_df)

original_df.shape #shows (3, 2)

Now we have the original dataframe created as the input, we will produce the resulting new dataframe. In doing so, we use groupby(),agg(), and pd.concat(). We also reset the index.

new_df = original_df.groupby('name')['table'].agg(lambda series: pd.concat(series.tolist())).reset_index()
print(new_df)

#check that Bob's table is now a concatenated table of df1 and df3:
new_df[new_df['name']=='Bob']['table'][0]

The output to the last line of code is:

    var1    letter
0   12.0    b1
1   34.0    b2
2   -4.0    b3
3    NaN    b4
0   22.0    b5
1   -3.0    b6
2    7.0    b7
3   78.0    b8

How to aggregate, combining dataframes, with pandas groupby

Question

2 answers

solution1
1 ACCPTED 2020-10-07 22:11:08

Not a duplicate

solution2
1 2020-10-08 02:26:29

How to aggregate, combining dataframes, with pandas groupby

Question

2 answers

solution1 1 ACCPTED 2020-10-07 22:11:08

Not a duplicate

solution2 1 2020-10-08 02:26:29

solution1
1 ACCPTED 2020-10-07 22:11:08

solution2
1 2020-10-08 02:26:29