[英]Count distinct strings in rolling window using pandas + python (with a condition)
[英]Pandas - Rolling window count with condition
在 Pandas 滚动 window 计数中,如何根据给定条件仅对 select 进行某些行? 我在文档或其他问题中没有找到任何解决方案。
使用以下 dataframe:
random.seed(42)
date_0 = datetime.datetime(2020, 1, 1, 0, 0, 0, 0)
dates = [date_0 + datetime.timedelta(seconds=random.uniform(0, 120)) for i in range(500)]
dates.sort()
speeds = [random.uniform(1, 10) for i in range(500)]
speeds.sort()
pressures = [i**2 + random.normalvariate(0, 1) for i in speeds]
data = [speeds, pressures]
df = pd.DataFrame(data=list(zip(speeds, pressures)), columns=['speed', 'pressure'], index=dates)
dates_1 = random.sample(dates, int(len(dates) * 0.6))
df.loc[:, 'controlled'] = False
df.loc[dates_1, 'controlled'] = True
df.loc[:, 'rolling_obs_count'] = df.loc[:, 'speed'].rolling(window=str(1) + 's').count()
print('df: \n', df)
df:
speed pressure controlled rolling_obs_count
2020-01-01 00:00:00.048713 1.024082 4.491483 True 1.0
2020-01-01 00:00:00.068628 1.084577 0.773474 True 2.0
2020-01-01 00:00:00.202953 1.091360 0.584872 True 3.0
2020-01-01 00:00:00.425483 1.125268 2.361184 False 4.0
2020-01-01 00:00:00.665378 1.127335 2.050226 True 5.0
... ... ... ... ...
2020-01-01 00:01:59.531574 9.945263 98.811644 True 5.0
2020-01-01 00:01:59.534566 9.976833 99.481287 False 6.0
2020-01-01 00:01:59.704513 9.990121 99.452698 False 6.0
2020-01-01 00:01:59.814533 9.996152 99.479074 True 6.0
2020-01-01 00:01:59.913896 9.999170 100.584748 True 7.0
计数 function 计算滚动 window 中的所有行,我只需要计算“受控”列为“真”的行。 我怎样才能做到这一点?
也许很简单:
>>> df.rolling('1s')['controlled'].sum()
2020-01-01 00:00:00.048713 1.0
2020-01-01 00:00:00.068628 2.0
2020-01-01 00:00:00.202953 3.0
2020-01-01 00:00:00.425483 3.0
2020-01-01 00:00:00.665378 4.0
...
2020-01-01 00:01:59.531574 5.0
2020-01-01 00:01:59.534566 5.0
2020-01-01 00:01:59.704513 4.0
2020-01-01 00:01:59.814533 4.0
2020-01-01 00:01:59.913896 5.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.