[英]How to get values in a pandas dataframe column between 2 times?
I have a dataset where the date_time column was separated into date and time.我有一个数据集,其中 date_time 列被分成日期和时间。 This is so date could be used separately from time in different scenarios.
这样日期就可以在不同的场景中与时间分开使用。 But now I need to get the time values between 5:00 - 8:00.
但现在我需要获取 5:00 - 8:00 之间的时间值。 I only find functions in pandas for datetimes.
我只在 Pandas 中找到日期时间的函数。 Is there any way to ONLY get values from a time column?
有没有办法只从时间列中获取值?
I think part of the issue is the data type for the time column.我认为问题的一部分是时间列的数据类型。 I have tried to remove the colon in the time value, so that 5:00 becomes 500. But I still am unable to choose the values I need.
我试图删除时间值中的冒号,使 5:00 变为 500。但我仍然无法选择我需要的值。 I keep getting a Key error on 'time'.
我一直在“时间”上收到关键错误。
Here is what I tried so far:这是我到目前为止尝试过的:
# Get bird sightings between 5-8am. Remove the colon in time first.
early_birds_df = france_df['time'].str.replace(':','')
# Convert time to a numeric data type, so we can treat it like a number
early_birds_df['time'] = pd.to_numeric(early_birds_df['time'], errors='coerce')
early_birds_df.head()
But this returns an error:但这会返回一个错误:
KeyError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2897 try:
-> 2898 return self._engine.get_loc(casted_key)
2899 except KeyError as err:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()
KeyError: 'time'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
3 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2898 return self._engine.get_loc(casted_key)
2899 except KeyError as err:
-> 2900 raise KeyError(key) from err
2901
2902 if tolerance is not None:
KeyError: 'time'
Here is a code snippet to use as an example.这是用作示例的代码片段。 I want to use the `time' column and it has an index of sorts.
我想使用“时间”列,它有一个排序索引。 Let's say I want to retrieve all rows that are between the times of 1:00 and 3:10.
假设我想检索 1:00 和 3:10 之间的所有行。 What code can I use to do that?
我可以使用什么代码来做到这一点?
date time
1 8/15/2013 0:18
2 8/15/2013 0:48
3 8/15/2013 1:17
4 8/15/2013 1:47
5 8/15/2013 2:17
6 8/15/2013 2:47
7 8/15/2013 3:02
8 8/15/2013 3:17
9 8/15/2013 3:32
10 8/15/2013 3:47
If the times are between hourly values, then you can use (for your example of 5:00 and 8:00)如果时间在每小时值之间,那么您可以使用(例如 5:00 和 8:00)
df[df["date_time"].dt.hour.between(5,8)]
To be more general you can use pandas.DatetimeIndex.indexer_between_time
but this requires converting your timestamp series to a DatetimeIndex
first, ie更一般地说,您可以使用
pandas.DatetimeIndex.indexer_between_time
但这需要首先将您的时间戳系列转换为DatetimeIndex
,即
df["date_time"].iloc[pd.DatetimeIndex(df["date_time"]).indexer_between_time("05:00", "08:00")]
or you can convert the times to their corresponding timedeltas since the start of the day, and then compare against timedelta values, eg或者您可以将时间转换为自一天开始以来相应的 timedeltas,然后与 timedelta 值进行比较,例如
time = df["date_time"] - df["date_time"].dt.floor("D")
df[time.between(pd.Timedelta("05:00:00"), pd.Timedelta("08:00:00"))]
edit编辑
Just saw the new data format with time
column.刚刚看到带有
time
列的新数据格式。 In that case you can append seconds to the strings so that we can work with to_timedelta
, eg在这种情况下,您可以将秒附加到字符串,以便我们可以使用
to_timedelta
,例如
pd.to_timedelta(df["time"] + ":00").between(pd.to_timedelta("05:00:00"), pd.to_timedelta("08:00:00"))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.