简体   繁体   English

如何在特定时间范围内获取字典或数据集?

[英]How to get a dictionary or set of data within a particular time frame?

My datafile contains datetimeindex - which is date and time in format - 1900-01-01 07:35:23.253.我的数据文件包含 datetimeindex - 它是格式的日期和时间 - 1900-01-01 07:35:23.253。

I have one million records where every minute , multiple data points are collected .我有一百万条记录,每分钟收集多个数据点。

datafile =   
TIme----------------------------                      datapoint1-----------datapoint2     
1900-01-01 07:35:23.253---- A --------------------B    
1900-01-01 07:35:23.253    -----B----------------------BH   
1900-01-01 08:35:23.253------V---------------------gh  
1900-01-01 09:35:23.253--------u--------------------90    
1900-01-01 09:36:23.253--------i----------------------op  
1900-01-01 10:36:23.253---------y---------------------op   
1900-01-01 10:46:23.253--------ir---------------------op

So My output should be , I want to get the all the number of rows within one hour interval time period like below所以我的输出应该是,我想获得一小时间隔时间段内的所有行数,如下所示

07:00:00--08:00:00  --- 2  
08:00:00-09:00:00 - 1   
09:00:00=10:00:00 - 2    
10:00:00-11:00:00 -1 

You can use pd.Grouper with freq='1H' and then use strftime to play around with the format you want as well as pd.DateOffset(hours=1) to add one hour to create a range (note: it is a string):您可以使用pd.Grouperfreq='1H'然后使用strftime使用您想要的格式以及pd.DateOffset(hours=1)添加一小时以创建一个范围(注意:它是一个字符串):

df['TIme'] = pd.to_datetime(df['TIme'])
df = df.groupby(pd.Grouper(freq='1H', key='TIme'))['datapoint1'].count().reset_index()
df['TIme'] = (df['TIme'].astype(str) + '-' + 
              ((df['TIme'] + pd.DateOffset(hours=1)).dt.strftime('%H:%M:%S')).astype(str))
df
Out[1]: 
                           TIme  datapoint1
0  1900-01-01 07:00:00-08:00:00           2
1  1900-01-01 08:00:00-09:00:00           1
2  1900-01-01 09:00:00-10:00:00           2
3  1900-01-01 10:00:00-11:00:00           2

If TIme is on the index, then you can first df = df.reset_index() before running code and then use df = df.set_index('TIme') after running code:如果TIme在索引上,那么您可以在运行代码之前先df = df.reset_index() ,然后在运行代码后使用df = df.set_index('TIme')

# df['TIme'] = pd.to_datetime(df['TIme'])
# df = df.set_index('TIme')
df = df.reset_index()
df = df.groupby(pd.Grouper(freq='1H', key='TIme'))['datapoint1'].count().reset_index()
df['TIme'] = (df['TIme'].astype(str) + '-' + 
              ((df['TIme'] + pd.DateOffset(hours=1)).dt.strftime('%H:%M:%S')).astype(str))
df = df.set_index('TIme')
df

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM