I would like to get a count for a columns by a time period in pandas dataframe.
my table:
id1 date_time adress a_size
reom 2005-8-20 22:51:10 75157.5413 ceifwekd
reom 2005-8-20 22:55:25 3571.37946 ceifwekd
reom 2005-8-20 11:21:01 3571.37946 tnohcve
reom 2005-8-20 11:29:09 97439.219 tnohcve
penr 2005-8-20 17:07:16 97439.219 ceifwekd
penr 2005-8-20 19:10:37 7391.6258 ceifwekd
....
i need:
id1 time_period num_of_address
reom 2005-8-20 22:50:00 - 23:00:00 2
reom 2005-8-20 11:20:00 - 11:30:00 2
penr 2005-8-20 17:00:00 - 17:10:00 1
My code: I have created a new column to get hours from the date_time.
df['num_per_10_minutes'] = df['id1'].map(df.groupby('id1', 'hours').apply(lambda x: x['date_time'].count()))
But this is not what I want. I need to count the numnber of "address" per 10 minutes.
Thanks
Make interval column first, and use pandas.DataFrame.groupby
:
import pandas as pd
df['date_time'] = pd.to_datetime(df['date_time'])
df = df.set_index('date_time', drop= True).sort_index()
df['intervals'] = ["%s - %s" % (i, i+1)
for i in pd.date_range('2005-08-20', '2005-08-21', freq='10 min')
for d in df.index if i<= d <= (i+1)]
df.groupby(['id1', 'intervals'])['adress'].count().reset_index()
Output:
id1 intervals adress
0 penr 2005-08-20 17:00:00 - 2005-08-20 17:10:00 1
1 penr 2005-08-20 19:10:00 - 2005-08-20 19:20:00 1
2 reom 2005-08-20 11:20:00 - 2005-08-20 11:30:00 2
3 reom 2005-08-20 22:50:00 - 2005-08-20 23:00:00 2
First aggregate counts by GroupBy.size
with Series.dt.floor
:
df['date_time'] = pd.to_datetime(df['date_time'])
df = df.groupby(['id1', df['date_time'].dt.floor('10Min')]).size().reset_index(name='adress')
print (df)
id1 date_time adress
0 penr 2005-08-20 17:00:00 1
1 penr 2005-08-20 19:10:00 1
2 reom 2005-08-20 11:20:00 2
3 reom 2005-08-20 22:50:00 2
And then change format of datetimes by Series.dt.strftime
, with next 10 Min
:
df['date_time'] = (df['date_time'].dt.strftime('%Y-%m-%d %H:%M:%S') +
(df['date_time'] + pd.Timedelta(10, unit='min')).dt.strftime(' - %H:%M:%S'))
print (df)
id1 date_time adress
0 penr 2005-08-20 17:00:00 - 17:10:00 1
1 penr 2005-08-20 19:10:00 - 19:20:00 1
2 reom 2005-08-20 11:20:00 - 11:30:00 2
3 reom 2005-08-20 22:50:00 - 23:00:00 2
df['date_time'] = (df['date_time'].dt.strftime('%Y-%m-%d %H:%M:%S') +
(df['date_time'] + pd.Timedelta(10, unit='min')).
dt.strftime(' - %Y-%m-%d %H:%M:%S'))
print (df)
id1 date_time adress
0 penr 2005-08-20 17:00:00 - 2005-08-20 17:10:00 1
1 penr 2005-08-20 19:10:00 - 2005-08-20 19:20:00 1
2 reom 2005-08-20 11:20:00 - 2005-08-20 11:30:00 2
3 reom 2005-08-20 22:50:00 - 2005-08-20 23:00:00 2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.