简体   繁体   English

如何根据行中的特定值和熊猫中的另一列对行进行分组?

[英]How to group rows based on specific value in a row and another column in pandas?

I can't think of a great way to ask this in one sentence so I'll show what I want to do.我想不出用一句话来问这个问题的好方法,所以我会展示我想做什么。

Let's say I have table with each row being an event fired by someone going through a book rental process.假设我有一个表,每一行都是一个由某人通过图书租赁过程触发的事件。 We have 2 events, basket(contains the books they want) and checkout(fired when checkout is successful and books are rented).我们有 2 个事件,篮子(包含他们想要的书)和结帐(结帐成功并租借图书时触发)。 I want to group by name but also by checkout events and each basket event before it.我想按名称分组,但也想按结帐事件和它之前的每个篮子事件分组。 Here's an example group just grouped on the name="tim".这是一个仅按 name="tim" 分组的示例组。

|-------------------------------------------------|
| time | name |  stage   |        payload         |
|-------------------------------------------------|
| 1000 | tim  |  basket  | harrypotter;twilight;  |
|-------------------------------------------------|
| 1001 | tim  |  basket  | harrypotter;           |
|-------------------------------------------------|
| 1002 | tim  | checkout | Order# 123456789       |
|-------------------------------------------------|
| 1003 | tim  |  basket  | pandasfordummies;      |
|-------------------------------------------------|
| 1004 | tim  | checkout | Order# 145246263       |
|-------------------------------------------------|

My question is how can I group so that each group has 1 checkout event in it, like this:我的问题是如何分组以便每个组中有 1 个结帐事件,如下所示:

First Order第一个订单

|-------------------------------------------------|
| time | name |  stage   |        payload         |
|-------------------------------------------------|
| 1000 | tim  |  basket  | harrypotter;twilight;  |
|-------------------------------------------------|
| 1001 | tim  |  basket  | harrypotter;           |
|-------------------------------------------------|
| 1002 | tim  | checkout | Order# 123456789       |
|-------------------------------------------------|

Second Order二阶

|-------------------------------------------------|
| time | name |  stage   |        payload         |
|-------------------------------------------------|
| 1003 | tim  |  basket  | pandasfordummies;      |
|-------------------------------------------------|
| 1004 | tim  | checkout | Order# 145246263       |
|-------------------------------------------------|

Sorry if this is worded terribly.对不起,如果这措辞很糟糕。

Assuming your table is in a pandas dataframe and already sorted by time and name, you can use the following code:假设您的表位于 Pandas 数据框中并且已经按时间和名称排序,您可以使用以下代码:

import numpy as np
import pandas as pd

df = pd.DataFrame({'time': [1000, 1001, 1002, 1003, 1004],
               'name':['tim', 'tim', 'tim', 'tim', 'tim'],
               'stage':['basket', 'basket', 'checkout', 'basket', 'checkout'],
               'payload':['harrypotter;twilight;', 'harrypotter;', 'Order# 123456789', 'pandasfordummies;', 'Order# 145246263']})

orders = np.split(df, np.where(df.stage == 'checkout')[0] + 1)

This will create an array of the split dataframes in orders that you can access normally like orders[0] , orders[1] etc.这将orders您可以正常访问的orders创建一个拆分数据帧数组,例如orders[0]orders[1]等。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 熊猫:如何根据特定的行值将值应用于一组行? - Pandas: How to apply a value to a group of rows based on a specific row value? 熊猫根据特定的列值将数据框中的行分组 - Pandas group the rows in a dataframe based on specific column value 如何根据Pandas中具有特定值的列将多行合并为一行 - How to merge several rows into one row based on a column with specific value in Pandas 基于 pandas 中另一列的 group by 删除行 - Drop rows based on group by of another column in pandas 使用组过滤器,当列值在另一个行列值的范围内时,熊猫选择行 - Pandas select rows when column value within range from another row column value with group filter 在Pandas中,如何根据另一行中的另一列值更新一行中的列值 - In Pandas how to update column value in one row based on another column value in another row 如何根据另一列的值将行分解为多行? - How to Explode row into multiple rows based on value of another column? 根据另一个列值在列中保留具有特定值的连续行中的最大行 - Keep maximum row in consecutive rows with specific value in column based on another column value 将Pandas数据帧分组一列,根据另一列删除行 - Group Pandas dataframe by one column, drop rows based on another column pandas:如果组的最后一行具有特定的列值,如何删除组的所有行 - pandas: how to drop all rows of a group if the last row of the group has certain column value
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM