简体   繁体   English

过滤器 pandas dataframe,行值必须高于列的偏移最大值,每个客户

[英]Filter pandas dataframe, row value must be higher than an offset max value of the column, per customer

I have a dataframe of orders containing Customer ID, Order ID, Revenue and Order Date Like so:我有一个包含客户 ID、订单 ID、收入和订单日期的订单 dataframe,如下所示:

Customer ID客户ID Order ID订单编号 Revenue收入 Order Date订购日期
A一个 1 1 10 10 05-08-2022 05-08-2022
B 2 2 10 10 04-07-2022 04-07-2022
C C 3 3 10 10 05-02-2022 05-02-2022

And so forth.等等。 I am trying to copy this dataframe but only keeping each row where a given customers order date, is between the latest order date and three months prior to it.我正在尝试复制此 dataframe 但仅保留给定客户订购日期介于最新订购日期和之前三个月之间的每一行。 Ie the condition is variable for each row.即条件对于每一行都是可变的。

I've tried something like this我试过这样的东西

df_filtered = df.loc[df['Order Date']>=(df.max(['Date Order']- DateOffset(months=3)))]

But get the error "TypeError: unsupported operand type(s) for -: 'list' and 'DateOffset'"但是得到错误“TypeError: unsupported operand type(s) for -: 'list' and 'DateOffset'”

I've also tried to create a separate dataframe where I've grouped by Customer ID and calculated the date 3 months prior to latest purchase.我还尝试创建一个单独的 dataframe ,其中我按客户 ID 分组并计算了最近购买前 3 个月的日期。

Like this:像这样:

Customer ID客户ID Last_purchase_3M Last_purchase_3M
A一个 05-05-2022 05-05-2022
B 04-04-2022 04-04-2022
C C 05-12-2021 05-12-2021

With the intention to do something like this:打算做这样的事情:

df_filtered = df.loc[df['Order Date']>=df_list['last_purchase_3M'] & df['Customer ID'] == df_list['Customer ID']]

But this gives me this error "TypeError: unsupported operand type(s) for &: 'int' and 'str'".但这给了我这个错误“TypeError:&:'int'和'str'不支持的操作数类型”。

I clearly don't know what I am doing here (also I'm new to this;))我显然不知道我在这里做什么(我也是新手;))

Am I on the right track or is this completely wrong?我是在正确的轨道上还是完全错误的?

Three is a dtype issue here三是这里的dtype问题

df_filtered = df.loc[df['Order Date']>=(df.max(['Date Order']- DateOffset(months=3)))]

Try尝试

df['Order Date'] = pd.to_datetime(df['Order Date'])
df_filtered = df.loc[df['Order Date'].ge(df['Order Date'].max()- DateOffset(months=3))]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM