熊猫groupby并求和到其他数据框

Question

I have a dictionary where key is a file name and values are dataframes that looks like: 我有一本字典，其中key是一个文件名，而值是如下所示的数据帧：

col1     col2
A        10
B        20
A        20 
A        10
B        10

I want to groupby based on 'col1' to sum values in 'col2' and store it to new dataframe 'df' whose output should look like: 我想基于'col1'进行分组，将'col2'中的值求和并将其存储到新的数据帧'df'中，其输出应类似于：

The output should look like: 输出应如下所示：

Index    A      B  
file1     40     30
file2     50     35

My code: 我的代码：

df=pd.DataFrame(columns=['A','B'])
for key, value in data.items():

    cnt=(value.groupby('Type')['Packets'].sum())
    print(cnt)

    df.append(cnt,ignore_index=True)

Answer 1

You should try to avoid appending in a loop. 您应尽量避免附加循环。 This is inefficient and not recommended. 这效率低下，不建议这样做。

Instead, you can concatenate your dataframes into one large dataframe, then use pivot_table : 取而代之的是，您可以将数据帧连接成一个大数据帧，然后使用pivot_table ：

# aggregate values in your dictionary, adding a "file" series
df_comb = pd.concat((v.assign(file=k) for k, v in data.items()), ignore_index=True)

# perform 'sum' aggregation, specifying index, columns & values
df = df_comb.pivot_table(index='file', columns='col1', values='col2', aggfunc='sum')

Explanation 说明

v.assign(file=k) adds a series file to each dataframe with value set to the filename. v.assign(file=k)向每个数据v.assign(file=k)添加一个序列file ，其值设置为文件名。
pd.concat concatenates all the dataframes in your dictionary. pd.concat连接字典中的所有数据pd.concat 。
pd.DataFrame.pivot_table is a Pandas method which allows you to create Excel-style pivot tables via specifying index , columns , values and aggfunc (aggregation function). pd.DataFrame.pivot_table是Pandas方法，允许您通过指定index ， columns ， values和aggfunc （聚合函数）来创建Excel样式的数据透视表。

Answer 2

Another suggested way with group-by, transpose, and row stack into dataframe. 另一种建议的方式是使用分组，转置和行堆栈到数据帧。

import pandas as pd
import numpy as np

df_1 = pd.DataFrame({'col1':['A', 'B', 'A', 'A', 'B'], 'col2':[10, 20, 20, 10, 10]})
df_2 = pd.DataFrame({'col1':['A', 'B', 'A', 'A', 'B'], 'col2':[30, 10, 15, 5, 25]})
df_1_agg = df_1.groupby(['col1']).agg({'col2':'sum'}).T.values
df_2_agg = df_2.groupby(['col1']).agg({'col2':'sum'}).T.values
pd.DataFrame(np.row_stack((df_1_agg, df_2_agg)), index = ['file1', 'file2']).rename(columns = {0:'A', 1:'B'})

Edited : to generalize, you need to put it into the function and loop through. 编辑：概括起见，您需要将其放入函数中并进行遍历。 Also, need to format the index (file{i}) for general cases. 另外，在一般情况下，需要格式化索引（文件{i}）。

lst_df = [df_1, df_2]

df_all = []

for i in lst_df:
    # iterate every data faame
    df_agg = i.groupby(['col1']).agg({'col2':'sum'}).T.values

    # append to the accumulator
    df_all.append(df_agg)

pd.DataFrame(np.row_stack(df_all), index = ['file1', 'file2']).rename(columns = {0:'A', 1:'B'})

熊猫groupby并求和到其他数据框

问题描述

2 个解决方案

解决方案1
0 2018-11-05 16:00:05

解决方案2
0 2018-11-05 16:07:34

熊猫groupby并求和到其他数据框

问题描述

2 个解决方案

解决方案1 0 2018-11-05 16:00:05

解决方案2 0 2018-11-05 16:07:34

解决方案1
0 2018-11-05 16:00:05

解决方案2
0 2018-11-05 16:07:34