I have a situation where I need to calculate the total number of clients for a day from a DataFrame where the values increase and decrease. But here is the catch:
If I have a Dataframe like so
DATETIME CLIENTS
2018-03-03 08:00:00 1
2018-03-03 09:00:00 2
2018-03-03 10:00:00 3
2018-03-03 11:00:00 4
2018-03-03 12:00:00 5
2018-03-03 13:00:00 3
2018-03-03 14:00:00 4
2018-03-03 15:00:00 5
The max total number of clients for this day is 7
because it rises to 5
at 12:00:00
then the value decreases the next hour BUT we do not subtract from 5
and then it rises to 4
at 14:00:00
so we ADD 1
and 5
at 15:00:00
so we ADD another 1
so in total there are 7
max clients throughout the day.
I have tried cumsum() and MAX() as thought these would be useful but alas...
I need to implement this either in SQL or Python. Would appreciate any help!
You logic is that you only want to count the coming-in visitors, not the leaving ones. Now, if you take diff()
, then those coming-in are positive and leaving are negative. So we can just mask the negative with 0
and sum again.
Let's try:
dates = df.DATETIME.dt.normalize()
max_visitors = (df.groupby(dates)['CLIENTS'].diff() # find the difference
.fillna(df['CLIENTS']) # these are the first records in the day
.clip(0) # replace negatives with 0
.groupby(dates).sum() # sum by days
)
Output:
DATETIME
2018-03-03 7.0
Name: CLIENTS, dtype: float64
If your version of MySql is 8.0+ then you can use LAG()
window function and aggregation:
select
sum(case when clients > prev then clients - prev end) total
from (
select *, lag(clients, 1, 0) over (order by datetime) prev
from tablename
where date(datetime) = '2018-03-03'
) t
See the demo .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.