[英]How to group rows based on specific value in a row and another column in pandas?
I can't think of a great way to ask this in one sentence so I'll show what I want to do.我想不出用一句话来问这个问题的好方法,所以我会展示我想做什么。
Let's say I have table with each row being an event fired by someone going through a book rental process.假设我有一个表,每一行都是一个由某人通过图书租赁过程触发的事件。 We have 2 events, basket(contains the books they want) and checkout(fired when checkout is successful and books are rented).
我们有 2 个事件,篮子(包含他们想要的书)和结帐(结帐成功并租借图书时触发)。 I want to group by name but also by checkout events and each basket event before it.
我想按名称分组,但也想按结帐事件和它之前的每个篮子事件分组。 Here's an example group just grouped on the name="tim".
这是一个仅按 name="tim" 分组的示例组。
|-------------------------------------------------|
| time | name | stage | payload |
|-------------------------------------------------|
| 1000 | tim | basket | harrypotter;twilight; |
|-------------------------------------------------|
| 1001 | tim | basket | harrypotter; |
|-------------------------------------------------|
| 1002 | tim | checkout | Order# 123456789 |
|-------------------------------------------------|
| 1003 | tim | basket | pandasfordummies; |
|-------------------------------------------------|
| 1004 | tim | checkout | Order# 145246263 |
|-------------------------------------------------|
My question is how can I group so that each group has 1 checkout event in it, like this:我的问题是如何分组以便每个组中有 1 个结帐事件,如下所示:
First Order第一个订单
|-------------------------------------------------|
| time | name | stage | payload |
|-------------------------------------------------|
| 1000 | tim | basket | harrypotter;twilight; |
|-------------------------------------------------|
| 1001 | tim | basket | harrypotter; |
|-------------------------------------------------|
| 1002 | tim | checkout | Order# 123456789 |
|-------------------------------------------------|
Second Order二阶
|-------------------------------------------------|
| time | name | stage | payload |
|-------------------------------------------------|
| 1003 | tim | basket | pandasfordummies; |
|-------------------------------------------------|
| 1004 | tim | checkout | Order# 145246263 |
|-------------------------------------------------|
Sorry if this is worded terribly.对不起,如果这措辞很糟糕。
Assuming your table is in a pandas dataframe and already sorted by time and name, you can use the following code:假设您的表位于 Pandas 数据框中并且已经按时间和名称排序,您可以使用以下代码:
import numpy as np
import pandas as pd
df = pd.DataFrame({'time': [1000, 1001, 1002, 1003, 1004],
'name':['tim', 'tim', 'tim', 'tim', 'tim'],
'stage':['basket', 'basket', 'checkout', 'basket', 'checkout'],
'payload':['harrypotter;twilight;', 'harrypotter;', 'Order# 123456789', 'pandasfordummies;', 'Order# 145246263']})
orders = np.split(df, np.where(df.stage == 'checkout')[0] + 1)
This will create an array of the split dataframes in orders
that you can access normally like orders[0]
, orders[1]
etc.这将
orders
您可以正常访问的orders
创建一个拆分数据帧数组,例如orders[0]
、 orders[1]
等。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.