简体   繁体   English

按月分组,按列对行求和,保留其他列

[英]Group by month, sum rows based in column, and keep the other columns

I have a DataFrame df as follows:我有一个 DataFrame df 如下:

|size    | date        | name | type     | revenue |
|10      | 13/12/2021  | A    | Standard | 0,2     |
|248743  | 15/12/2021  | A    | Standard | 0,2     |
|234     | 03/12/2022  | A    | Basic    | 0,1     |
|8734684 | 31/03/2022  | B    | Basic    | 0,1     |
|3589749 | 01/04/2021  | C    | Basic    | 0,4     |
|3356943 | 02/04/2021  | A    | Basic    | 0,1     |
|6908746 | 21/04/2021  | A    | Basic    | 0,1     |
|2375940 | 21/02/2022  | D    | Premium  | 0,7     |
|9387295 | 21/02/2022  | D    | Premium  | 0,7     |
|286432  | 21/02/2022  | D    | Premium  | 0,7     |
|192     | 31/03/2022  | D    | Premium  | 0,7     |
|486     | 18/02/2022  | E    | Standard | 0,9     |
|23847   | 24/10/2021  | F    | Basic    | 0,3     |
|82346   | 12/11/2021  | B    | Premium  | 0,5     |
|28352   | 03/01/2022  | A    | Basic    | 0,1     |

I need to group by month with the size sum for rows which name and type are the same:我需要按月对名称和类型相同的行的大小总和进行分组:

|size    | date | name | type     | revenue |
|28352   | Jan  | A    | Basic    | 0,1     |
|486     | Feb  | E    | Standard | 0,9     |
|12049667| Feb  | D    | Premium  | 0,7     |
|192     | Mar  | D    | Premium  | 0,7     |
|8734684 | Mar  | B    | Basic    | 0,1     |
|3589749 | Apr  | C    | Basic    | 0,4     |
|10265689| Apr  | A    | Basic    | 0,1     |
|23847   | Oct  | F    | Basic    | 0,3     |
|82346   | Nov  | B    | Premium  | 0,5     |
|248753  | Dec  | A    | Standard | 0,2     |
|234     | Dec  | A    | Basic    | 0,1     |

I tried this code but it did not work:我试过这段代码,但没有用:

df['date'] = pd.to_datetime(df['date'])
df1 = df.groupby(df['date'].dt.strftime('%B'))['size'].sum()
df2 = df1.groupby(['date', 'name', 'type', 'revenue'],as_index=False).sum()

How can I do it?我该怎么做?

IIUC, you need a single groupby . IIUC,你需要一个groupby You need to rework your "revenue" column as numeric.您需要将“收入”列重新设置为数字。

df['date'] = pd.to_datetime(df['date'], dayfirst=True)

group = df['date'].dt.strftime('%b')

(df.assign(revenue=pd.to_numeric(df['revenue'].str.replace(',', '.')))
   .groupby([group, 'name', 'type'])
   .agg('sum')
   .reset_index()
 )

Output: Output:

   date name      type      size  revenue
0   Apr    A     Basic   6908746      0.1
1   Dec    A  Standard    248753      0.4
2   Dec    B   Premium     82346      0.5
3   Feb    A     Basic   3356943      0.1
4   Feb    D   Premium  12049667      2.1
5   Feb    E  Standard       486      0.9
6   Jan    C     Basic   3589749      0.4
7   Mar    A     Basic     28586      0.2
8   Mar    B     Basic   8734684      0.1
9   Mar    D   Premium       192      0.7
10  Oct    F     Basic     23847      0.3

Note that the above is aggregating months of different years into the same group.请注意,以上是将不同年份的月份汇总到同一组中。 If you want to keep years separate, use a period:如果你想把年份分开,使用句点:

group = df['date'].dt.to_period('M')

Output: Output:

       date name      type      size  revenue
0   2021-01    C     Basic   3589749      0.4
1   2021-02    A     Basic   3356943      0.1
2   2021-04    A     Basic   6908746      0.1
3   2021-10    F     Basic     23847      0.3
4   2021-12    A  Standard    248753      0.4
5   2021-12    B   Premium     82346      0.5
6   2022-02    D   Premium  12049667      2.1
7   2022-02    E  Standard       486      0.9
8   2022-03    A     Basic     28586      0.2
9   2022-03    B     Basic   8734684      0.1
10  2022-03    D   Premium       192      0.7

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 基于一列分组并获得其他列熊猫的唯一性和总和 - Group BY based on one column and get unique and sum of other columns pandas Groupby 并按 1 列求和,保留所有其他列,并改变一个新列,用 pandas 计算求和行 - Groupby and sum by 1 column, keep all other columns, and mutate a new column, counting summed rows with pandas 我想对 multiindex-dataframe 中的列的行求和,但保留其他列的值 - I want to sum the rows of a column in multiindex-dataframe but keep the values of other columns Panda 按特定列的总和分组并保留其他列 - Panda Group by sum specific columns and keep other columns 按计数和总和分组,基于pandas数据框中的特定列以及其他列 - group by count and sum based on particular column in pandas dataframe in separate column along with other columns 根据列对行进行分组并对它们的值求和 - group rows based on column and sum their values 为缺少的月份添加行,其中所有其他列保持不变,其余列对于那些新添加的行为 0 - Adding rows for missing month where keep all some of the other column is same as it is and the rest of the columns is 0 for those newly added rows 基于一列“分组”行,然后为现有其他列值的可能组合创建新列 - "Group" rows based on one column, then create new columns for the possible combinations of existing other columns' values 根据 panda df 的重复列值对其他列求和/平均进行分组 - Group by based on repeating column values of panda df to sum/average other columns 根据其他列和行添加新列 - Adding a new column based on other columns and rows
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM