I have a Pandas DataFrame containing the visits on a website, I have two columns ID
number and the date
in the format YYYY-mm-dd HH:mm:ss
.
I would like to get a data frame containing the time difference between any visit of a customer. I found how to get the numbers of visits using GROUPBY but I don't know for the rest.
Edit:
No. IDs date
1 4678 2012-11-30 23:59:59
2 4703 2012-11-30 23:59:23
3 4678 2012-11-30 23:58:46
4 5803 2012-11-30 23:58:19
5 4678 2012-11-30 23:58:07
And I would like to get for each ID number something like this:
Visit_number duration since last visit
4678 1 0
2 73s
3 39s
For now I only managed to calculate the number of visits for each ID number with array.groupby(['IDs']).size()
To calculate the visit number, you can use groupby and cumcount
:
In [76]: df['Visit_Number'] = df.groupby('IDs').cumcount() + 1
Next, for the duration, you can use diff
for each group:
In [77]: df['duration'] = - df.groupby('IDs')['date'].diff()
In [78]: df
Out[78]:
IDs date Visit_Number duration
0 4678 2012-11-30 23:59:59 1 NaT
1 4703 2012-11-30 23:59:23 1 NaT
2 4678 2012-11-30 23:58:46 2 00:01:13
3 5803 2012-11-30 23:58:19 1 NaT
4 4678 2012-11-30 23:58:07 3 00:00:39
This gives you the difference as a timedelta
, to have it in seconds and fill the NaN values:
In [79]: df['duration'] = df['duration'].astype('timedelta64[s]').fillna(0)
In [80]: df
Out[80]:
IDs date Visit_Number duration
0 4678 2012-11-30 23:59:59 1 0
1 4703 2012-11-30 23:59:23 1 0
2 4678 2012-11-30 23:58:46 2 73
3 5803 2012-11-30 23:58:19 1 0
4 4678 2012-11-30 23:58:07 3 39
Something like the following:
import pandas as pd
import datetime
a = pd.read_csv("a.csv")
a.date = a.date.map(lambda s: datetime.datetime.strptime(s, "%Y-%m-%d %H:%M:%S"))
for user_id, series in a.sort("date").groupby("id"):
print user_id, series.date.diff()
Outputs:
4678 4 NaT
2 00:00:39
0 00:01:13
Name: date, dtype: timedelta64[ns]
4703 1 NaT
Name: date, dtype: timedelta64[ns]
5803 3 NaT
Name: date, dtype: timedelta64[ns]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.