如何根据行中的特定值和熊猫中的另一列对行进行分组？

Question

I can't think of a great way to ask this in one sentence so I'll show what I want to do.我想不出用一句话来问这个问题的好方法，所以我会展示我想做什么。

Let's say I have table with each row being an event fired by someone going through a book rental process.假设我有一个表，每一行都是一个由某人通过图书租赁过程触发的事件。 We have 2 events, basket(contains the books they want) and checkout(fired when checkout is successful and books are rented).我们有 2 个事件，篮子（包含他们想要的书）和结帐（结帐成功并租借图书时触发）。 I want to group by name but also by checkout events and each basket event before it.我想按名称分组，但也想按结帐事件和它之前的每个篮子事件分组。 Here's an example group just grouped on the name="tim".这是一个仅按 name="tim" 分组的示例组。

|-------------------------------------------------|
| time | name |  stage   |        payload         |
|-------------------------------------------------|
| 1000 | tim  |  basket  | harrypotter;twilight;  |
|-------------------------------------------------|
| 1001 | tim  |  basket  | harrypotter;           |
|-------------------------------------------------|
| 1002 | tim  | checkout | Order# 123456789       |
|-------------------------------------------------|
| 1003 | tim  |  basket  | pandasfordummies;      |
|-------------------------------------------------|
| 1004 | tim  | checkout | Order# 145246263       |
|-------------------------------------------------|

My question is how can I group so that each group has 1 checkout event in it, like this:我的问题是如何分组以便每个组中有 1 个结帐事件，如下所示：

First Order第一个订单

|-------------------------------------------------|
| time | name |  stage   |        payload         |
|-------------------------------------------------|
| 1000 | tim  |  basket  | harrypotter;twilight;  |
|-------------------------------------------------|
| 1001 | tim  |  basket  | harrypotter;           |
|-------------------------------------------------|
| 1002 | tim  | checkout | Order# 123456789       |
|-------------------------------------------------|

Second Order二阶

|-------------------------------------------------|
| time | name |  stage   |        payload         |
|-------------------------------------------------|
| 1003 | tim  |  basket  | pandasfordummies;      |
|-------------------------------------------------|
| 1004 | tim  | checkout | Order# 145246263       |
|-------------------------------------------------|

Sorry if this is worded terribly.对不起，如果这措辞很糟糕。

Answer 1

Assuming your table is in a pandas dataframe and already sorted by time and name, you can use the following code:假设您的表位于 Pandas 数据框中并且已经按时间和名称排序，您可以使用以下代码：

import numpy as np
import pandas as pd

df = pd.DataFrame({'time': [1000, 1001, 1002, 1003, 1004],
               'name':['tim', 'tim', 'tim', 'tim', 'tim'],
               'stage':['basket', 'basket', 'checkout', 'basket', 'checkout'],
               'payload':['harrypotter;twilight;', 'harrypotter;', 'Order# 123456789', 'pandasfordummies;', 'Order# 145246263']})

orders = np.split(df, np.where(df.stage == 'checkout')[0] + 1)

This will create an array of the split dataframes in orders that you can access normally like orders[0] , orders[1] etc.这将orders您可以正常访问的orders创建一个拆分数据帧数组，例如orders[0] 、 orders[1]等。

如何根据行中的特定值和熊猫中的另一列对行进行分组？

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-04-01 19:53:31

如何根据行中的特定值和熊猫中的另一列对行进行分组？

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-04-01 19:53:31

解决方案1
1 已采纳 2020-04-01 19:53:31