[英]Change time series frequency, ffill values until the next input but with a limit
I have data with timestamps, I want to make it into 1min time series and fill the missing values in rows that are created with the last input.我有带时间戳的数据,我想将其制作成 1 分钟的时间序列,并在使用最后一次输入创建的行中填充缺失值。 However, also have a limit on the ffill function as well.
但是,对于 ffill function 也有限制。 So, if the next input is missing for too long, leave NaN.
因此,如果下一个输入丢失的时间太长,请保留 NaN。
Data:数据:
timestamp pay
2020-10-10 23:32 50
2020-10-11 21:55 80
2020-10-13 23:28 40
Convert to this using df.set_index('timestamp').asfreq('1Min', method='ffill')
, forward fill the pay column until the next input, but if the next input is more than 24 hours away (1440 rows), only fill up to 1440 rows.使用
df.set_index('timestamp').asfreq('1Min', method='ffill')
转换为此,前向填充 pay 列直到下一个输入,但如果下一个输入超过 24 小时(1440 行) ), 最多只能填充 1440 行。
So, 2020-10-11 21:55 80
should only filled with 80 until 2020-10-12 21:55
, then leave NaN until 2020-10-13 23:28 40
.因此,
2020-10-11 21:55 80
应该只填充80直到2020-10-12 21:55
,然后保留 NaN 直到2020-10-13 23:28 40
。
How can I achieve this?我怎样才能做到这一点?
i think you can use resample and ffill with limit option.我认为您可以使用resample和ffill with limit选项。 Can you try this:
你能试试这个吗:
mask = df.set_index('timestamp').sort_index().resample('1Min').ffill(limit=1440)
Based on Clegane's very good answer I would like to add there is no need for sort_index()
and the limit should be 1339 (1 value + 1339 makes the full day (1440)).基于 Clegane 的非常好的回答,我想补充一点,不需要
sort_index()
并且限制应该是 1339(1 值 + 1339 构成一整天(1440))。 Therefore:所以:
output = df.set_index('timestamp').resample('1Min').fillna(method='ffill',limit=1339)
Quality Check质量检验
To ensure it's correctly working:为确保其正常工作:
output['pay'].value_counts()
Returns:退货:
50.0 1343 #Less than a day, so 100% filled
80.0 1440 #Over a day of range, so topped at 1440
40.0 1
Name: pay, dtype: int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.