简体   繁体   English

在 dataframe 中循环,日期和日期之间在 python

[英]cycle in a dataframe with dates and between dates in python

I would like to process all the data between two dates changing the dates.我想处理更改日期的两个日期之间的所有数据。 In particular, I have the following dataframe:特别是,我有以下dataframe:

                   real    model2      model1
date                                               
2017-01-01 00:00:00   51.22   52.776425   52.583711
2017-01-01 01:00:00   53.00   47.211506   50.679937
2017-01-01 02:00:00   52.00   44.722529   48.478772
2017-01-01 03:00:00   51.00   42.475170   45.141708
2017-01-01 04:00:00   47.27   38.862827   44.583250
2017-01-01 05:00:00   45.49   39.473972   44.930338
2017-01-01 06:00:00   45.69   42.465659   47.380179

where dates are also indexes.其中日期也是索引。 I would like to collect all the data day by day in a list to pass to a function.我想每天在一个列表中收集所有数据以传递给 function。 I have done it in a not smart\correct way as:我以一种不聪明\正确的方式完成了它:

for iday in range(1,9):
   #
   #
   start_date = '2017-01-0'+str(iday)+ ' 00:00:00'
   end_date   = '2017-01-0'+str(iday)+ ' 23:00:00'
   #
   data_sub_e = EE.loc[start_date:end_date]

It sounds not correct, it is difficult to extend to a number of day greater then 10 and it seems to not use pandas feature.听起来不正确,很难扩展到大于 10 的天数,而且似乎不使用 pandas 功能。

Is there any smart way to do that?有什么聪明的方法可以做到这一点吗?

Thanks in advance,提前致谢,

Diego迭戈

I assume that date is of datetime type (not string ).我假设datedatetime类型(不是string )。

Using df.index.date you can select rows by the date part only .使用df.index.date您可以仅按日期部分select 行。

Eg:例如:

d1 = pd.to_datetime('2017-01-01')  # The criterion date
df[df.index.date == d1]   # Get all rows from this date, whatever the hour part

Another hint : Instead of your loop based on the day number:另一个提示:而不是基于天数的循环:

for iday in range(1,9):

run a loop based on pd.date_range , something like:运行基于pd.date_range的循环,例如:

for dat in pd.date_range('2017-01-01', '2017-01-15', freq='D'):

Of course, set the end date according to your needs.当然,根据您的需要设置结束日期。

Another choice can be to group your DataFrame by the date part of the index:另一种选择是按索引的日期部分对 DataFrame 进行分组:

df.groupby(pd.Grouper(freq='D'))

and then apply your function to each group.然后将您的 function 应用于每个组。

Edit following the comment按照评论编辑

To change your values into lists, for each group, you can use named aggregation :要将您的值更改为列表,对于每个组,您可以使用命名聚合

df.groupby(pd.Grouper(freq='D')).agg({'real': list,
    'model1': list, 'model2': list})

If you want to assign own column names, you can use another syntax, with named parameters:如果要分配自己的列名,可以使用另一种语法,带有命名参数:

df.groupby(pd.Grouper(freq='D')).agg(Real=('real', list),
    Model_1=('model1', list), Model_2=('model2', list))

Here parameter names specify output column names.此处参数名称指定 output 列名称。 The value of each parameter is a tuple: ( original column name , aggregation function ).每个参数的值是一个元组:(原始列名聚合 function )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM