[英]How to get a dictionary or set of data within a particular time frame?
My datafile contains datetimeindex - which is date and time in format - 1900-01-01 07:35:23.253.我的数据文件包含 datetimeindex - 它是格式的日期和时间 - 1900-01-01 07:35:23.253。
I have one million records where every minute , multiple data points are collected .我有一百万条记录,每分钟收集多个数据点。
datafile =
TIme---------------------------- datapoint1-----------datapoint2
1900-01-01 07:35:23.253---- A --------------------B
1900-01-01 07:35:23.253 -----B----------------------BH
1900-01-01 08:35:23.253------V---------------------gh
1900-01-01 09:35:23.253--------u--------------------90
1900-01-01 09:36:23.253--------i----------------------op
1900-01-01 10:36:23.253---------y---------------------op
1900-01-01 10:46:23.253--------ir---------------------op
So My output should be , I want to get the all the number of rows within one hour interval time period like below所以我的输出应该是,我想获得一小时间隔时间段内的所有行数,如下所示
07:00:00--08:00:00 --- 2
08:00:00-09:00:00 - 1
09:00:00=10:00:00 - 2
10:00:00-11:00:00 -1
You can use pd.Grouper
with freq='1H'
and then use strftime
to play around with the format you want as well as pd.DateOffset(hours=1)
to add one hour to create a range (note: it is a string):您可以使用
pd.Grouper
和freq='1H'
然后使用strftime
使用您想要的格式以及pd.DateOffset(hours=1)
添加一小时以创建一个范围(注意:它是一个字符串):
df['TIme'] = pd.to_datetime(df['TIme'])
df = df.groupby(pd.Grouper(freq='1H', key='TIme'))['datapoint1'].count().reset_index()
df['TIme'] = (df['TIme'].astype(str) + '-' +
((df['TIme'] + pd.DateOffset(hours=1)).dt.strftime('%H:%M:%S')).astype(str))
df
Out[1]:
TIme datapoint1
0 1900-01-01 07:00:00-08:00:00 2
1 1900-01-01 08:00:00-09:00:00 1
2 1900-01-01 09:00:00-10:00:00 2
3 1900-01-01 10:00:00-11:00:00 2
If TIme
is on the index, then you can first df = df.reset_index()
before running code and then use df = df.set_index('TIme')
after running code:如果
TIme
在索引上,那么您可以在运行代码之前先df = df.reset_index()
,然后在运行代码后使用df = df.set_index('TIme')
:
# df['TIme'] = pd.to_datetime(df['TIme'])
# df = df.set_index('TIme')
df = df.reset_index()
df = df.groupby(pd.Grouper(freq='1H', key='TIme'))['datapoint1'].count().reset_index()
df['TIme'] = (df['TIme'].astype(str) + '-' +
((df['TIme'] + pd.DateOffset(hours=1)).dt.strftime('%H:%M:%S')).astype(str))
df = df.set_index('TIme')
df
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.