简体   繁体   English

Pandas - 滚动 window 计数条件

[英]Pandas - Rolling window count with condition

How is it possible to select only some rows, based on a given condition, in a Pandas rolling window count?在 Pandas 滚动 window 计数中,如何根据给定条件仅对 select 进行某些行? I have not find any solution in the documentation, or in other questions.我在文档或其他问题中没有找到任何解决方案。

With the following dataframe:使用以下 dataframe:

random.seed(42)
date_0 = datetime.datetime(2020, 1, 1, 0, 0, 0, 0)
dates = [date_0 + datetime.timedelta(seconds=random.uniform(0, 120)) for i in range(500)]
dates.sort()
speeds = [random.uniform(1, 10) for i in range(500)]
speeds.sort()
pressures = [i**2 + random.normalvariate(0, 1) for i in speeds]
data = [speeds, pressures]
df = pd.DataFrame(data=list(zip(speeds, pressures)), columns=['speed', 'pressure'], index=dates)
dates_1 = random.sample(dates, int(len(dates) * 0.6))
df.loc[:, 'controlled'] = False
df.loc[dates_1, 'controlled'] = True
df.loc[:, 'rolling_obs_count'] = df.loc[:, 'speed'].rolling(window=str(1) + 's').count()
print('df: \n', df)
df: 
                                speed    pressure  controlled  rolling_obs_count
2020-01-01 00:00:00.048713  1.024082    4.491483        True                1.0
2020-01-01 00:00:00.068628  1.084577    0.773474        True                2.0
2020-01-01 00:00:00.202953  1.091360    0.584872        True                3.0
2020-01-01 00:00:00.425483  1.125268    2.361184       False                4.0
2020-01-01 00:00:00.665378  1.127335    2.050226        True                5.0
...                              ...         ...         ...                ...
2020-01-01 00:01:59.531574  9.945263   98.811644        True                5.0
2020-01-01 00:01:59.534566  9.976833   99.481287       False                6.0
2020-01-01 00:01:59.704513  9.990121   99.452698       False                6.0
2020-01-01 00:01:59.814533  9.996152   99.479074        True                6.0
2020-01-01 00:01:59.913896  9.999170  100.584748        True                7.0

the count function counts all rows within the rolling window, and I need to count only the rows where "controlled" column is "True".计数 function 计算滚动 window 中的所有行,我只需要计算“受控”列为“真”的行。 How can I do that?我怎样才能做到这一点?

Perhaps simply:也许很简单:

>>> df.rolling('1s')['controlled'].sum()
2020-01-01 00:00:00.048713    1.0
2020-01-01 00:00:00.068628    2.0
2020-01-01 00:00:00.202953    3.0
2020-01-01 00:00:00.425483    3.0
2020-01-01 00:00:00.665378    4.0
                             ... 
2020-01-01 00:01:59.531574    5.0
2020-01-01 00:01:59.534566    5.0
2020-01-01 00:01:59.704513    4.0
2020-01-01 00:01:59.814533    4.0
2020-01-01 00:01:59.913896    5.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM