[英]Counting Pattern Occurrences in a Large Data frame without Iterating Python Pandas
[英]Counting occurrences in time interval in a pandas data frame
我有這個簡單的數據框:
Date and time Event
--------------------------
2020-03-23 9:05:03 A
2020-03-23 14:06:02 B
2020-03-23 9:06:43C B
2020-03-23 12:11:50 D
2020-03-23 12:12:38 D
2020-03-23 12:13:17 B
2020-03-23 12:14:07 A
2020-03-23 12:14:54 A
2020-04-29 10:37:09 A
2020-04-29 10:39:13 A
2020-04-29 11:53:33 A
2020-04-29 12:04:46 C
2020-04-30 19:15:29 D
2020-04-30 16:18:4 B
我想在 4H 小時的時間間隔內計算Event
中的出現次數並創建一個新的數據框。
我試圖得到這樣的東西:
10:00-14:00 14:00-18:00 18:00-22:00 22:00-02:00
A 2 1 3 0
B 0 1 1 2
C 1 2 1 1
D 0 0 0 2
我嘗試過使用重采樣進行聚合,然后從DateTime
中提取Time
,然后應用計數,我還嘗試了使用pd.TimeGrouper()
的不同組合,但所有這些似乎都不起作用。 我不知道如何設置那些 4 小時的時間間隔,所以我可以應用聚合。
此時,我已經搜索了所有相關帖子,但找不到解決方案。
任何建議將不勝感激。
您可以嘗試時間箱:
df['Date and time'] = pd.to_datetime(df['Date and time'])
bins = [10, 14, 18, 20, 24]
labels = ['10:00-14:00','14:00-18:00','18:00-20:00','20:00-24:00']
df['TimeBin'] = pd.cut(df['Date and time'].dt.hour, bins, labels=labels, right=False)
result = df.pivot_table(index= ['Event'], columns=['TimeBin'], aggfunc='count')
這是使用 pandas .groupby()
、 .explode()
和'.pivot_table()
>>> import pandas as pd
>>> df = pd.DataFrame([i.strip().split(' ') for i in ''' 2020-03-23 9:05:03 A
... 2020-03-23 14:06:02 B
... 2020-03-23 9:06:43 B
... 2020-03-23 12:11:50 D
... 2020-03-23 12:12:38 D
... 2020-03-23 12:13:17 B
... 2020-03-23 12:14:07 A
... 2020-03-23 12:14:54 A
... 2020-04-29 10:37:09 A
... 2020-04-29 10:39:13 A
... 2020-04-29 11:53:33 A
... 2020-04-29 12:04:46 C
... 2020-04-30 19:15:29 D
... 2020-04-30 16:18:04 B '''.split('\n')], columns=['Date and time', 'Event'])
>>> df
Date and time Event
0 2020-03-23 9:05:03 A
1 2020-03-23 14:06:02 B
2 2020-03-23 9:06:43 B
3 2020-03-23 12:11:50 D
4 2020-03-23 12:12:38 D
5 2020-03-23 12:13:17 B
6 2020-03-23 12:14:07 A
7 2020-03-23 12:14:54 A
8 2020-04-29 10:37:09 A
9 2020-04-29 10:39:13 A
10 2020-04-29 11:53:33 A
11 2020-04-29 12:04:46 C
12 2020-04-30 19:15:29 D
13 2020-04-30 16:18:04 B
>>> # convert Date and time column to datetime type
>>> df['Date and time'] = pd.to_datetime(df['Date and time'])
>>> # groupby based on freq 4H
>>> df = df.groupby(pd.Grouper(key='Date and time', freq='4H')).agg(list).explode('Event')
>>> df = df.reset_index().dropna()
>>> # retrieve time value and convert it to time bins
>>> def time_binning(x):
... return f'{x.time()} - {(x + pd.offsets.DateOffset(hours=3, minutes=59, seconds=59)).time()}'
...
>>> df['time'] = df['Date and time'].apply(time_binning)
>>> # pivot table
>>> df = df.pivot_table(index='Event', columns='time', aggfunc='count', fill_value=0)['Date and time']
>>> df
time 08:00:00 - 11:59:59 12:00:00 - 15:59:59 16:00:00 - 19:59:59
Event
A 4 2 0
B 1 2 1
C 0 1 0
D 0 2 1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.