Dataframe 基于一列分组并获得另一列所需项目的值总和

Question

目前我的 dataframe 是：

dd = [[1001,'green apple',1,7],[1001,'red apple',1,2],[1001,'grapes',1,5],[1002,'green apple',2,4],[1002,'red apple',2,4],[1003,'red apple',3,8],[1004,'mango',4,2],[1004,'red apple',4,6]]
df = pd.DataFrame(dd, columns = ['colID','colString','custID','colQuantity'])

   colID     colString     custID     colQuantity 
0   1001    green apple     1            7
1   1001    red apple       1            2
2   1001    grapes          1            5
3   1002    green apple     2            4
4   1002    red apple       2            4
5   1003    red apple       3            8
6   1004    mango           4            2
7   1004    red apple       4            6

现在我只设法使用代码过滤包含红色和绿色苹果的行：

selection = ['green apple','red apple']
mask = df.colString.apply(lambda x: any(item for item in selection if item in x))
df = df[mask]

当前 Output：

   colID     colString     custID     colQuantity 
0   1001    green apple     1            7
1   1001    red apple       1            2
3   1002    green apple     2            4
4   1002    red apple       2            4
5   1003    red apple       3            8
7   1004    red apple       4            6

最终所需的 output 得到具有相同 colID 的青苹果和红苹果的总和：

   colID   custID colQuantity
   1001      1        9
   1002      2        8

Answer 1

您可以使用isin索引 dataframe 然后groupby.sum ：

(df[df.colString.isin(['green apple', 'red apple'])]
   .groupby(['colID','colString'], as_index=False)
   .sum())

    colID   colString  colQuantity
0   1001  green apple            7
1   1001    red apple            2
2   1002  green apple            4
3   1002    red apple            4
4   1003    red apple            8
5   1004    red apple            6

Dataframe 基于一列分组并获得另一列所需项目的值总和

问题描述

1 个解决方案

解决方案1
2 2020-05-22 14:43:31

Dataframe 基于一列分组并获得另一列所需项目的值总和

问题描述

1 个解决方案

解决方案1 2 2020-05-22 14:43:31

解决方案1
2 2020-05-22 14:43:31