簡體   English   中英

在 pandas 數據幀中按時間間隔計算出現次數

[英]Counting occurrences in time interval in a pandas data frame

我有這個簡單的數據框:

 Date and time        Event 
 --------------------------
 2020-03-23 9:05:03    A
 2020-03-23 14:06:02   B
 2020-03-23 9:06:43C   B
 2020-03-23 12:11:50   D
 2020-03-23 12:12:38   D
 2020-03-23 12:13:17   B
 2020-03-23 12:14:07   A
 2020-03-23 12:14:54   A
 2020-04-29 10:37:09   A
 2020-04-29 10:39:13   A
 2020-04-29 11:53:33   A
 2020-04-29 12:04:46   C
 2020-04-30 19:15:29   D
 2020-04-30 16:18:4    B 

我想在 4H 小時的時間間隔內計算Event中的出現次數並創建一個新的數據框。

我試圖得到這樣的東西:

   10:00-14:00  14:00-18:00  18:00-22:00  22:00-02:00
A       2            1            3             0
B       0            1            1             2
C       1            2            1             1
D       0            0            0             2   

我嘗試過使用重采樣進行聚合,然后從DateTime中提取Time ,然后應用計數,我還嘗試了使用pd.TimeGrouper()的不同組合,但所有這些似乎都不起作用。 我不知道如何設置那些 4 小時的時間間隔,所以我可以應用聚合。

此時,我已經搜索了所有相關帖子,但找不到解決方案。

任何建議將不勝感激。

您可以嘗試時間箱:

df['Date and time'] = pd.to_datetime(df['Date and time'])
bins = [10, 14, 18, 20, 24]
labels = ['10:00-14:00','14:00-18:00','18:00-20:00','20:00-24:00']
df['TimeBin'] = pd.cut(df['Date and time'].dt.hour, bins, labels=labels, right=False)
result = df.pivot_table(index= ['Event'], columns=['TimeBin'], aggfunc='count')

這是使用 pandas .groupby().explode()'.pivot_table()

>>> import pandas as pd
>>> df = pd.DataFrame([i.strip().split('   ') for i in '''  2020-03-23 9:05:03   A
...  2020-03-23 14:06:02   B
...  2020-03-23 9:06:43   B
...  2020-03-23 12:11:50   D
...  2020-03-23 12:12:38   D
...  2020-03-23 12:13:17   B
...  2020-03-23 12:14:07   A
...  2020-03-23 12:14:54   A
...  2020-04-29 10:37:09   A
...  2020-04-29 10:39:13   A
...  2020-04-29 11:53:33   A
...  2020-04-29 12:04:46   C
...  2020-04-30 19:15:29   D
...  2020-04-30 16:18:04   B '''.split('\n')], columns=['Date and time', 'Event'])
>>> df
          Date and time Event
0    2020-03-23 9:05:03     A
1   2020-03-23 14:06:02     B
2    2020-03-23 9:06:43     B
3   2020-03-23 12:11:50     D
4   2020-03-23 12:12:38     D
5   2020-03-23 12:13:17     B
6   2020-03-23 12:14:07     A
7   2020-03-23 12:14:54     A
8   2020-04-29 10:37:09     A
9   2020-04-29 10:39:13     A
10  2020-04-29 11:53:33     A
11  2020-04-29 12:04:46     C
12  2020-04-30 19:15:29     D
13  2020-04-30 16:18:04     B
>>> # convert Date and time column to datetime type
>>> df['Date and time'] = pd.to_datetime(df['Date and time'])
>>> # groupby based on freq 4H
>>> df = df.groupby(pd.Grouper(key='Date and time', freq='4H')).agg(list).explode('Event')
>>> df = df.reset_index().dropna()
>>> # retrieve time value and convert it to time bins
>>> def time_binning(x):
...     return f'{x.time()} - {(x + pd.offsets.DateOffset(hours=3, minutes=59, seconds=59)).time()}'
...
>>> df['time'] = df['Date and time'].apply(time_binning)
>>> # pivot table
>>> df = df.pivot_table(index='Event', columns='time', aggfunc='count', fill_value=0)['Date and time']
>>> df
time   08:00:00 - 11:59:59  12:00:00 - 15:59:59  16:00:00 - 19:59:59
Event
A                        4                    2                    0
B                        1                    2                    1
C                        0                    1                    0
D                        0                    2                    1

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM