简体   繁体   中英

How to get a dictionary or set of data within a particular time frame?

My datafile contains datetimeindex - which is date and time in format - 1900-01-01 07:35:23.253.

I have one million records where every minute , multiple data points are collected .

datafile =   
TIme----------------------------                      datapoint1-----------datapoint2     
1900-01-01 07:35:23.253---- A --------------------B    
1900-01-01 07:35:23.253    -----B----------------------BH   
1900-01-01 08:35:23.253------V---------------------gh  
1900-01-01 09:35:23.253--------u--------------------90    
1900-01-01 09:36:23.253--------i----------------------op  
1900-01-01 10:36:23.253---------y---------------------op   
1900-01-01 10:46:23.253--------ir---------------------op

So My output should be , I want to get the all the number of rows within one hour interval time period like below

07:00:00--08:00:00  --- 2  
08:00:00-09:00:00 - 1   
09:00:00=10:00:00 - 2    
10:00:00-11:00:00 -1 

You can use pd.Grouper with freq='1H' and then use strftime to play around with the format you want as well as pd.DateOffset(hours=1) to add one hour to create a range (note: it is a string):

df['TIme'] = pd.to_datetime(df['TIme'])
df = df.groupby(pd.Grouper(freq='1H', key='TIme'))['datapoint1'].count().reset_index()
df['TIme'] = (df['TIme'].astype(str) + '-' + 
              ((df['TIme'] + pd.DateOffset(hours=1)).dt.strftime('%H:%M:%S')).astype(str))
df
Out[1]: 
                           TIme  datapoint1
0  1900-01-01 07:00:00-08:00:00           2
1  1900-01-01 08:00:00-09:00:00           1
2  1900-01-01 09:00:00-10:00:00           2
3  1900-01-01 10:00:00-11:00:00           2

If TIme is on the index, then you can first df = df.reset_index() before running code and then use df = df.set_index('TIme') after running code:

# df['TIme'] = pd.to_datetime(df['TIme'])
# df = df.set_index('TIme')
df = df.reset_index()
df = df.groupby(pd.Grouper(freq='1H', key='TIme'))['datapoint1'].count().reset_index()
df['TIme'] = (df['TIme'].astype(str) + '-' + 
              ((df['TIme'] + pd.DateOffset(hours=1)).dt.strftime('%H:%M:%S')).astype(str))
df = df.set_index('TIme')
df

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM