[英]Python: selecting rows by hour in a dataframe
I have the below dataframe in csv file, I would like to select all rows corresponding to current hour. 我在csv文件中具有以下数据框,我想选择与当前时间相对应的所有行。
time,values
2018-10-28 08:16:49.469508,48
2018-10-28 08:16:54.471987,48
2018-10-28 08:16:59.475236,48
2018-10-28 08:17:04.478681,48
Below is the funtion I am trying current = datetime.datetime.now() 以下是我正在尝试的功能 current = datetime.datetime.now()
start = datetime.datetime(current.year,current.month,current.day,current.hour,0)
end = datetime.datetime(current.year,current.month,current.day,current.hour,59)
df = pd.io.parsers.read_csv('water_data1.csv', parse_dates=[0], index_col=0)
print(df.query('start < time < end'))
I get the following error 我收到以下错误
pandas.core.computation.ops.UndefinedVariableError: name 'start' is not defined
pandas.core.computation.ops.UndefinedVariableError:名称“开始”未定义
Could someone suggest what is the right syntax to achieve this. 有人可以建议实现此目的的正确语法是什么。 Thanks Hemanth
谢谢赫曼思
你可以试试
df[(df['time'] > start) & (df['time'] < end])]
pd.DataFrame.query
requires external variables to be preceded by @
: pd.DataFrame.query
要求外部变量前面pd.DataFrame.query
@
:
df = pd.DataFrame({'A': list(range(10))})
start, end = 3, 6
print(df.query('@start < A < @end'))
A
4 4
5 5
You can also use pd.Series.between
: 您也可以使用
pd.Series.between
:
res = df[df['A'].between(start, end, inclusive=False)]
Finally, when working with datetime
values, you should prefer pd.Timestamp
over regular Python types: 最后,在使用
datetime
值时,您应该首选pd.Timestamp
不是常规的Python类型:
now = pd.Timestamp('now')
start = now.replace(second=0, microsecond=0)
end = now.replace(second=59, microsecond=0)
print((start, end))
(Timestamp('2018-11-01 17:36:00'), Timestamp('2018-11-01 17:36:59'))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.