如何在特定时间范围内获取字典或数据集？

Question

My datafile contains datetimeindex - which is date and time in format - 1900-01-01 07:35:23.253.我的数据文件包含 datetimeindex - 它是格式的日期和时间 - 1900-01-01 07:35:23.253。

I have one million records where every minute , multiple data points are collected .我有一百万条记录，每分钟收集多个数据点。

datafile =   
TIme----------------------------                      datapoint1-----------datapoint2     
1900-01-01 07:35:23.253---- A --------------------B    
1900-01-01 07:35:23.253    -----B----------------------BH   
1900-01-01 08:35:23.253------V---------------------gh  
1900-01-01 09:35:23.253--------u--------------------90    
1900-01-01 09:36:23.253--------i----------------------op  
1900-01-01 10:36:23.253---------y---------------------op   
1900-01-01 10:46:23.253--------ir---------------------op

So My output should be , I want to get the all the number of rows within one hour interval time period like below所以我的输出应该是，我想获得一小时间隔时间段内的所有行数，如下所示

07:00:00--08:00:00  --- 2  
08:00:00-09:00:00 - 1   
09:00:00=10:00:00 - 2    
10:00:00-11:00:00 -1

Answer 1

You can use pd.Grouper with freq='1H' and then use strftime to play around with the format you want as well as pd.DateOffset(hours=1) to add one hour to create a range (note: it is a string):您可以使用pd.Grouper和freq='1H'然后使用strftime使用您想要的格式以及pd.DateOffset(hours=1)添加一小时以创建一个范围（注意：它是一个字符串)：

df['TIme'] = pd.to_datetime(df['TIme'])
df = df.groupby(pd.Grouper(freq='1H', key='TIme'))['datapoint1'].count().reset_index()
df['TIme'] = (df['TIme'].astype(str) + '-' + 
              ((df['TIme'] + pd.DateOffset(hours=1)).dt.strftime('%H:%M:%S')).astype(str))
df
Out[1]: 
                           TIme  datapoint1
0  1900-01-01 07:00:00-08:00:00           2
1  1900-01-01 08:00:00-09:00:00           1
2  1900-01-01 09:00:00-10:00:00           2
3  1900-01-01 10:00:00-11:00:00           2

If TIme is on the index, then you can first df = df.reset_index() before running code and then use df = df.set_index('TIme') after running code:如果TIme在索引上，那么您可以在运行代码之前先df = df.reset_index() ，然后在运行代码后使用df = df.set_index('TIme') ：

# df['TIme'] = pd.to_datetime(df['TIme'])
# df = df.set_index('TIme')
df = df.reset_index()
df = df.groupby(pd.Grouper(freq='1H', key='TIme'))['datapoint1'].count().reset_index()
df['TIme'] = (df['TIme'].astype(str) + '-' + 
              ((df['TIme'] + pd.DateOffset(hours=1)).dt.strftime('%H:%M:%S')).astype(str))
df = df.set_index('TIme')
df

如何在特定时间范围内获取字典或数据集？

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-11-02 21:45:21

如何在特定时间范围内获取字典或数据集？

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-11-02 21:45:21

解决方案1
1 已采纳 2020-11-02 21:45:21