[英]Python PANDAS: Groupby Transform First Occurrence of Condition
I have dataframe in the following general format: 我有以下一般格式的数据框:
customer_id,transaction_dt,product,price,units
1,2004-01-02 00:00:00,thing1,25,47
1,2004-01-17 00:00:00,thing2,150,8
2,2004-01-29 00:00:00,thing2,150,25
3,2017-07-15 00:00:00,thing3,55,17
3,2016-05-12 00:00:00,thing3,55,47
4,2012-02-23 00:00:00,thing2,150,22
4,2009-10-10 00:00:00,thing1,25,12
4,2014-04-04 00:00:00,thing2,150,2
5,2008-07-09 00:00:00,thing2,150,43
5,2004-01-30 00:00:00,thing1,25,40
5,2004-01-31 00:00:00,thing1,25,22
5,2004-02-01 00:00:00,thing1,25,2
I have it sorted by the relevant fields in ascending order. 我按相关字段按升序排序。 Now what I am trying to figure out how to check for a criteria inside a group and create a new indicator flag for only first time it occurs.
现在,我试图弄清楚如何检查组中的条件并仅在第一次出现时创建新的指标标志。 As a toy example, I am trying to figure out something like this to start:
作为一个玩具示例,我试图找出类似的东西开始:
conditions = ((df['units'] > 20) | (df['price] > 50)
df['flag'] = df[conditions].groupby(['customer_id']).transform()
Any help on how best to formulate this properly would be most welcome! 我们将竭诚欢迎您提供任何有关如何最好地正确制定此方法的帮助!
Assuming you want the first chronological appearance of a customer_id
, within the grouping you defined, you can use query
, groupby
, and first
: 假设您希望在定义的分组中出现
customer_id
的第一个按时间顺序排列的外观,则可以使用query
, groupby
和first
:
(
df.sort_values("transaction_dt")
.query("units > 20 & price > 50")
.groupby("customer_id")
.first()
)
Note: The example data you provided doesn't actually have multiple customer_id
entries for the filters you specified, but the syntax will work in either case. 注意:您提供的示例数据实际上并没有为您指定的过滤器添加多个
customer_id
条目,但是无论哪种情况,语法都可以使用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.