[英]How to populate missing row with previous or next row in pandas python
I have sample data like this. 我有这样的示例数据。
date time option_type open high low close volume
6031 9/27/2018 09:17 CE 11500 0.15 0.15 0.15 0.15 1500
6131 9/27/2018 15:19 CE 11500 0.05 0.05 0.05 0.05 1500
6132 9/27/2018 15:22 CE 11500 0.05 0.05 0.05 0.05 75
6133 9/27/2018 15:24 CE 11500 0.05 0.05 0.05 0.05 225
6134 9/27/2018 15:25 CE 11500 0.05 0.05 0.05 0.05 75
6135 9/27/2018 15:26 CE 11500 0.05 0.05 0.05 0.05 600
Some of the rows are missing over there. 一些行在那边丢失了。 For example 09:15, 09:16 then 15:20, 15:21 etc.
例如09:15、09:16,然后15:20、15:21等。
I want to populate missing rows with previous row value in case of 15:20/15:21 and next available row in case of 09:15/09:16. 我想在15:20/15:21的情况下用上一行值填充缺失的行,在09:15/09:16的情况下用下一个可用行填充。 09:17 values will be used for 09:15/09:16.
09:17的值将用于09:15/09:16。 15:20 values will be used for 15:21/15:22.
15:20的值将用于15:21/15:22。
could you please help me on this. 你能帮我这个忙吗? Thanks in advance and appreciate your efforts and time.
在此先感谢您,感谢您的努力和时间。
Step 1: Finding difference in time on consecutive rows: 步骤1:找出连续列的时间差异:
df['deltaT'] = df.time.to_series().diff().dt.seconds.div(60, fill_value=0)
The above will give you a new column on how many mins diff. 上面将为您提供一个新的列,列出了差异分钟数。 is between the consecutive rows
在连续的行之间
Step2: Replicate rows based on new column deltaT
步骤2:根据新的列
deltaT
复制行
df.reindex(df.index.repeat(df.deltaT))
Step3: Building logic to increment time column 步骤3:建立逻辑以增加时间列
df['time'] = pd.to_timedelta(df['time']) + pd.to_timedelta(df['deltaT'], unit='m')
Still struggling to give you last part. 仍在努力为您做最后一部分。
If you find this helpful and can build upon after this.Great!! 如果您觉得这有帮助并且可以在此基础上继续发展。
I think you are looking for something like this : 我认为您正在寻找这样的东西:
df['time']=df['time'].fillna(method="ffill") #to carry the values forward
df['time']=df['time'].fillna(method="bfill") #to carry the values backwards
df DF
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.