I have a dictionary where key is a file name and values are dataframes that looks like:
col1 col2
A 10
B 20
A 20
A 10
B 10
I want to groupby based on 'col1' to sum values in 'col2' and store it to new dataframe 'df' whose output should look like:
The output should look like:
Index A B
file1 40 30
file2 50 35
My code:
df=pd.DataFrame(columns=['A','B'])
for key, value in data.items():
cnt=(value.groupby('Type')['Packets'].sum())
print(cnt)
df.append(cnt,ignore_index=True)
You should try to avoid appending in a loop. This is inefficient and not recommended.
Instead, you can concatenate your dataframes into one large dataframe, then use pivot_table
:
# aggregate values in your dictionary, adding a "file" series
df_comb = pd.concat((v.assign(file=k) for k, v in data.items()), ignore_index=True)
# perform 'sum' aggregation, specifying index, columns & values
df = df_comb.pivot_table(index='file', columns='col1', values='col2', aggfunc='sum')
Explanation
v.assign(file=k)
adds a series file
to each dataframe with value set to the filename. pd.concat
concatenates all the dataframes in your dictionary. pd.DataFrame.pivot_table
is a Pandas method which allows you to create Excel-style pivot tables via specifying index
, columns
, values
and aggfunc
(aggregation function). Another suggested way with group-by, transpose, and row stack into dataframe.
import pandas as pd
import numpy as np
df_1 = pd.DataFrame({'col1':['A', 'B', 'A', 'A', 'B'], 'col2':[10, 20, 20, 10, 10]})
df_2 = pd.DataFrame({'col1':['A', 'B', 'A', 'A', 'B'], 'col2':[30, 10, 15, 5, 25]})
df_1_agg = df_1.groupby(['col1']).agg({'col2':'sum'}).T.values
df_2_agg = df_2.groupby(['col1']).agg({'col2':'sum'}).T.values
pd.DataFrame(np.row_stack((df_1_agg, df_2_agg)), index = ['file1', 'file2']).rename(columns = {0:'A', 1:'B'})
Edited : to generalize, you need to put it into the function and loop through. Also, need to format the index (file{i}) for general cases.
lst_df = [df_1, df_2]
df_all = []
for i in lst_df:
# iterate every data faame
df_agg = i.groupby(['col1']).agg({'col2':'sum'}).T.values
# append to the accumulator
df_all.append(df_agg)
pd.DataFrame(np.row_stack(df_all), index = ['file1', 'file2']).rename(columns = {0:'A', 1:'B'})
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.