简体   繁体   English

如何隔离if语句两次之间在DataFrame中的行?

[英]How do you isolate the rows in a DataFrame that are between two times for an if statement?

I have a dataframe with data for each hourly period for a year, and I would like to create a new row with a zero in all the rows that are between 9:00 and 17:00 and in that same row add the data from another row if it is not in this time range 我有一个数据框架,其中包含一年中每个小时的数据,我想创建一个新行,该行在9:00至17:00之间的所有行中都为零,并在同一行中添加另一行中的数据如果不在此时间范围内则行

I believe that I want something like; 我相信我想要类似的东西;

if '9.00' >= final_df.index <= '17.00':
    do some action
else
    do another action

This is not yet working, the first reason is that at the moment it is missing the full date. 这尚不起作用,第一个原因是目前缺少完整日期。 Is there some way I can get around that? 有什么办法可以解决这个问题? The first line sort of works if I use; 如果使用的话,第一行的工作方式;

if '2017-10-16 9.00' >= final_df.index <= '2017-10-16 17.00':

Is there a way I can get around this. 有办法解决这个问题吗?

For reference the first 5 data points are; 作为参考,前五个数据点是;

                       A    B       C   D   E
Timestamp                   
2017-10-15 13:30:00 59.9    17.14   0   1   0
2017-10-15 14:30:00 64.3    17.22   0   1   0
2017-10-15 15:30:00 68.6    17.18   0   1   0
2017-10-15 16:30:00 77.6    17.08   0   1   0
2017-10-15 17:30:00 74.5    16.93   0   1   0

You can use DatetimeIndex.hour to create a mask that you could use on your DataFrame . 您可以使用DatetimeIndex.hour创建可以在DataFrame上使用的DataFrame For your given data, let's just say that the region of interest is that between 15 and 17, and that you want to sum A in the region and B outside. 对于给定的数据,我们只说感兴趣的区域在15到17之间,并且您想要将区域中的A与外部的B相加。 You would do that through something like the following: 您可以通过以下方式进行操作:

In [100]: mask = (df.index.hour > 14) & (df.index.hour < 17)

In [101]: df[mask].A.sum()
Out[101]: 146.2

In [102]: df[~mask].B.sum()
Out[102]: 51.29

Edit: The task that was now added to the question happens to also be readily solvable with this approach; 编辑:现在添加到问题的任务碰巧也可以用这种方法解决; assuming that the column of interest is B : 假设感兴趣的列是B

In [117]: df['Result'] = ~mask * df.B

In [118]: df
Out[118]:
                        A      B  C  D  E  Result
Timestamp
2017-10-15 13:30:00  59.9  17.14  0  1  0   17.14
2017-10-15 14:30:00  64.3  17.22  0  1  0   17.22
2017-10-15 15:30:00  68.6  17.18  0  1  0    0.00
2017-10-15 16:30:00  77.6  17.08  0  1  0    0.00
2017-10-15 17:30:00  74.5  16.93  0  1  0   16.93

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM