[英]Choose rows a fixed time-interval apart in Datetime-indexed pandas dataframe
I have a pandas dataframe indexed by DateTime from hour "00:00:00" until hour "23:59:00" (increments by minute, seconds not counted).我有一个由 DateTime 索引的熊猫数据框,从小时“00:00:00”到小时“23:59:00”(按分钟递增,不计算秒数)。
in: df.index
out: DatetimeIndex(['2018-10-08 00:00:00', '2018-10-08 00:00:00',
'2018-10-08 00:00:00', '2018-10-08 00:00:00',
'2018-10-08 00:00:00', '2018-10-08 00:00:00',
'2018-10-08 00:00:00', '2018-10-08 00:00:00',
'2018-10-08 00:00:00', '2018-10-08 00:00:00',
...
'2018-10-08 23:59:00', '2018-10-08 23:59:00',
'2018-10-08 23:59:00', '2018-10-08 23:59:00',
'2018-10-08 23:59:00', '2018-10-08 23:59:00',
'2018-10-08 05:16:00', '2018-10-08 07:08:00',
'2018-10-08 13:58:00', '2018-10-08 09:30:00'],
dtype='datetime64[ns]', name='DateTime', length=91846, freq=None)
Now I want to choose specific intervals, say every 1 minute, or every 1 hour, starting from "00:00:00" and retrieve all the rows that interval apart consecutively.现在我想选择特定的间隔,比如每 1 分钟或每 1 小时,从“00:00:00”开始,并连续检索间隔开的所有行。
I can grab entire intervals, say the first hour interval, with我可以抓住整个时间间隔,比如说第一个小时的时间间隔,
df.between_time("01:00:00","00:00:00")
But I want to be able to但我希望能够
(a) get only all the times that are a specific intervals apart (b) get all the 1-hour intervals without having to manually ask for them 24 times. (a) 仅获取相隔特定时间间隔的所有时间 (b) 获取所有 1 小时的时间间隔,而无需手动询问 24 次。 How do I increment the DatetimeIndex inside the between_time command?
如何在 between_time 命令中增加 DatetimeIndex? Is there a better way than that?
还有比这更好的方法吗?
I would solve this problem with masking rather than making new dataframes.我会用屏蔽而不是制作新的数据框来解决这个问题。 For example you can add a column
df['which_one']
and set different numbers for each subset.例如,您可以添加一列
df['which_one']
并为每个子集设置不同的数字。 Then you can access the subset by calling df[df['which_one']==x]
where x
is the subset you want to select.然后您可以通过调用
df[df['which_one']==x]
来访问子集,其中x
是您要选择的子集。 You can still do other conditional statements and just about everything else that Pandas had to offer by access the data this way.通过这种方式访问数据,您仍然可以执行其他条件语句以及 Pandas 必须提供的几乎所有其他内容。
PS There are other methods to access data that might be faster. PS 还有其他方法可以更快地访问数据。 I just used what I'm most comfortable with another way would be
df[df['which_one'].eq(x)]
.我只是使用了我最喜欢的另一种方式是
df[df['which_one'].eq(x)]
。
If you are deadset on dataframes I would suggest doing so with a dictionary of dataframes such as:如果您对数据帧感到厌烦,我建议您使用数据帧字典进行操作,例如:
import pandas as pd
dfdict={}
for i in range(0,10):
dfdict[i]=pd.DataFrame()
print(dfdict)
as you will see they are indeed dfs正如您将看到的,它们确实是 dfs
out[1]
{0: Empty DataFrame
Columns: []
Index: [], 1: Empty DataFrame
Columns: []
Index: [], 2: Empty DataFrame
Columns: []
Index: [], 3: Empty DataFrame
Columns: []
Index: [], 4: Empty DataFrame
Columns: []
Index: [], 5: Empty DataFrame
Columns: []
Index: [], 6: Empty DataFrame
Columns: []
Index: [], 7: Empty DataFrame
Columns: []
Index: [], 8: Empty DataFrame
Columns: []
Index: [], 9: Empty DataFrame
Columns: []
Index: []}
Although as others have suggested there might be a more practical approach to solve your problem (difficult to say without more specifics of the issue)尽管正如其他人所建议的那样,可能有更实用的方法来解决您的问题(如果没有更具体的问题很难说)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.