如何一次性匯總兩列不同的列，其中一列包含熊貓中的小數對象？

Question

我有一個數據框，我想匯總兩個不同列的總和。 這是我原始數據幀的df.head(5) 。

   price           name  quantity transaction_amount
pk                                                  
48  1.00      Product 1         1               1.00
48  1.00      Product 1         4               4.00
63  1.00      Product 2         2               2.00
63  1.00      Product 2         3               3.00
63  1.00      Product 2         1               1.00

我想按pk分組它們， pk是產品的數據庫主鍵，並獲取transaction_amount列和quantity數量列的總和。 但是當我執行df.groupby(['pk', 'name']).sum()我得到了：

                          quantity
pk name                           
48 Product 1                   543
63 Product 2                 17234
38 Product 3                  4014
39 Product 4                 11053
40 Product 5                 13406

transaction_amount列在哪里？ transaction_amount是transaction_amount中的quantity和該項目在該交易中的price 。 如果有折扣或其他折扣，這可以更改每筆交易。 我們需要記錄購買時對物品收取的費用。 所以我想要的結果將是quantity （總數量）， transaction_amounts （總數量）， name和pk如下所示：

                          quantity  transaction_amount
pk name                           
48 Product 1                   543              543.00
63 Product 2                 17234           89,000.93
38 Product 3                  4014            2,000.32
39 Product 4                 11053           25,000.36
40 Product 5                 13406            6,000.12

我閱讀了.sum()的文檔，但所有選項都不適合我。 如果我刪除price列並運行.sum(level=0)則需要很長時間。 看一下這兩種不同方法的時間（速度越快，它只會對“ quantity列求和）。

In [237]: %%timeit
     ...: df.groupby(['pk', 'name']).sum(level=0)
     ...: 
1 loop, best of 3: 3.04 s per loop

In [239]: %%timeit
     ...: df.groupby(['pk', 'name']).sum()
     ...: 
     ...: 
10 loops, best of 3: 42.4 ms per loop

.sum(axis=1)的結果也相似。

Answer 1

當我跑步

df.groupby(['pk', 'name']).sum()

我懂了

              price  quantity  transaction_amount
pk name                                          
48 Product 1    2.0         5                 5.0
63 Product 2    3.0         6                 6.0

這向我表明您的price和transaction_amount是對象。

Answer 2

由於您使用的是decimal.Decimal對象，因此numpy.sum不會處理您的對象。 因此，只需遵照內置的sum ：

In [18]: df
Out[18]:
   pk price       name  quantity transaction_amount
0  48   1.0  Product 1         1                1.0
1  48   1.0  Product 1         4                4.0
2  63   1.0  Product 2         2                2.0
3  63   1.0  Product 2         3                3.0
4  63   1.0  Product 2         1                1.0

In [19]: df.groupby(['pk', 'name']).aggregate({
    ...:     "quantity":np.sum,
    ...:     "price":sum,
    ...:     "transaction_amount":sum
    ...: })
Out[19]:
             price  quantity transaction_amount
pk name
48 Product 1   2.0         5                5.0
63 Product 2   3.0         6                6.0

注意，這會很慢，但這是使用object dtype列必須支付的價格。

Answer 3

您可以像這樣指定要累加的列。

df.groupby(['pk','name'])['quantity','transaction_amount'].sum()

如何一次性匯總兩列不同的列，其中一列包含熊貓中的小數對象？

問題描述

3 個解決方案

解決方案1
3 2017-08-02 23:15:11

解決方案2
2 已采納 2017-08-02 23:29:59

解決方案3
1 2017-08-02 23:19:27

如何一次性匯總兩列不同的列，其中一列包含熊貓中的小數對象？

問題描述

3 個解決方案

解決方案1 3 2017-08-02 23:15:11

解決方案2 2 已采納 2017-08-02 23:29:59

解決方案3 1 2017-08-02 23:19:27

解決方案1
3 2017-08-02 23:15:11

解決方案2
2 已采納 2017-08-02 23:29:59

解決方案3
1 2017-08-02 23:19:27