有没有办法使用 pandas dataframe 的 groupby 子句编写自定义累积聚合 function？

Question

Here's my dataframe这是我的 dataframe

+--------+-------------+----------+---------------+------------+-------------+-----------+
|        | Customer ID | Quantity | Invoice Value |       Date | InvoiceDate | UnitPrice |
+--------+-------------+----------+---------------+------------+-------------+-----------+
|    0   |   500249347 |      0.0 |         0.000 | 2018-01-02 |  2018-01-02 |     0.000 |
+--------+-------------+----------+---------------+------------+-------------+-----------+
|    1   |   500006647 |      1.0 |        33.715 | 2018-01-02 |  2018-01-02 |    33.715 |
+--------+-------------+----------+---------------+------------+-------------+-----------+
|    2   |   500407469 |      1.0 |        33.715 | 2018-01-02 |  2018-01-02 |    33.715 |
+--------+-------------+----------+---------------+------------+-------------+-----------+
|    3   |   500642846 |      0.0 |         0.000 | 2018-01-02 |  2018-01-02 |     0.000 |
+--------+-------------+----------+---------------+------------+-------------+-----------+
|    4   |   500005450 |      1.0 |        33.715 | 2018-01-02 |  2018-01-02 |    33.715 |
+--------+-------------+----------+---------------+------------+-------------+-----------+
|   ...  |         ... |      ... |           ... |        ... |         ... |       ... |
+--------+-------------+----------+---------------+------------+-------------+-----------+
| 429545 |   500717072 |      1.0 |        45.620 | 2019-03-31 |  2019-03-31 |    45.620 |
+--------+-------------+----------+---------------+------------+-------------+-----------+
| 429546 |   500105174 |      0.0 |         0.000 | 2019-03-31 |  2019-03-31 |     0.000 |
+--------+-------------+----------+---------------+------------+-------------+-----------+
| 429547 |   500069720 |      0.0 |         0.000 | 2019-03-31 |  2019-03-31 |     0.000 |
+--------+-------------+----------+---------------+------------+-------------+-----------+
| 429548 |   500105528 |      0.0 |         0.000 | 2019-03-31 |  2019-03-31 |     0.000 |
+--------+-------------+----------+---------------+------------+-------------+-----------+
| 429549 |   500732322 |      0.0 |         0.000 | 2019-03-31 |  2019-03-31 |     0.000 |
+--------+-------------+----------+---------------+------------+-------------+-----------+

I want to extract features (new columns) like days since last visit for each customer ( wrt.. snapshot date for each row), last billed amount, last non-zero billed amount, quantity and days since last purchase etc. can this be done using a some custom cumulative aggregate function or if there is a simpler way of doing it?我想提取特征（新列），例如自上次访问以来每个客户的天数（每行的快照日期）、上次开票金额、上次非零开票金额、数量和自上次购买以来的天数等。这可以是使用一些自定义累积聚合 function 完成，或者是否有更简单的方法？

Answer 1

I would suggest something like this:我会建议这样的事情：

import pandas as pd
df = pd.DataFrame({'customer_id': [13, 16, 13, 13, 16, 16, 13],
                   'Date': ['2018-01-02', '2019-03-31', '2019-03-31', '2018-01-02', '2018-01-02', '2019-04-31',
                            '2018-01-02'],
                   'Invoice_value': [920, 920, 920, 920, 921, 921, 921],
                   'Unit_price': [1, 2, 3, 4, 6, 7, 8]})

append_data = [df[(df['customer_id'] == ac)].sort_values(by=['Date']).iloc[-1] for ac in df.customer_id.unique()]

Answer 2

For time since last visit, I figured something like this:自上次访问以来的时间，我想到了这样的事情：

df['last_visited']=df.groupby('Customer ID')['Date'].diff()

有没有办法使用 pandas dataframe 的 groupby 子句编写自定义累积聚合 function？

问题描述

2 个解决方案

解决方案1
0 2020-08-02 17:45:52

解决方案2
0

有没有办法使用 pandas dataframe 的 groupby 子句编写自定义累积聚合 function？

问题描述

2 个解决方案

解决方案1 0 2020-08-02 17:45:52

解决方案2 0

解决方案1
0 2020-08-02 17:45:52

解决方案2
0