I am trying to create features for my ML work on a grocery customers data.
The data has transaction which user makes in buying groceries.
I am trying to find the name of the users who have made consecutive transactions within 30 seconds time frame. This is important to get a profile of such users
So for example if data looks like below:
User Datetime Amount
1 Mary 2020-11-30 10:10:20 24
2 Jacob 2020-11-30 12:12:12 43.2
3 Alice 2020-11-30 11:11:11 75.29
4 Mary 2020-11-30 10:10:45 34
5 Mary 2020-11-30 10:11:15 21
6 Alice 2020-11-30 11:11:41 100
the correct answer would be Alice as only Alice had more than 1 transactions which are within 30 seconds time frame.
Mary might appear as probable answer but not all consecutive transactions had 30 seconds gap. It had 25 and 30. So correct answer we need is Alice
One method is lag()
to get the time of the previous transaction. The following returns the transactions that are within 30 seconds:
select t.*
from (select t.*,
lag(datetime) over (partition by user order by datetime) as prev_datetime
from t
) t
where prev_datetime > datetime - interval '30 second';
This syntax uses standard SQL; date/time functions vary among databases, so the exact syntax depends on the database you are using.
It is unclear how you want to summarize this to get Alice but not Mary.
If you need for all transactions to be exactly 30 seconds, you can use:
select user
from (select t.*,
lag(datetime) over (partition by user order by datetime) as prev_datetime
from t
) t
group by user
having sum(prev_datetime <> datetime - interval 30 second) = 0;
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.