简体   繁体   English

使用 pandas reindex 函数填充缺失的数据行

[英]To fill the missing data lines with pandas reindex function

I am trying to fill the missing lines in my time series data using pandas reindex function.我正在尝试使用 pandas reindex 函数填充时间序列数据中缺失的行。

My data looks like:我的数据看起来像:

 100,2007,239,4,29.588,-30.851,-999.0,-999.0,-999.0,-999.00,13.125,-999.00
 100,2007,239,5,29.573,-30.843,-999.0,-999.0,-999.0,-999.00,13.126,-999.00
 100,2007,239,14,29.389,-30.880,-999.0,-999.0,-999.0,-999.00,13.131,-999.00
 100,2007,239,15,29.367,-30.901,-999.0,-999.0,-999.0,-999.00,13.131,-999.00
 100,2007,239,24,29.374,-30.920,-999.0,-999.0,-999.0,-999.00,13.135,-999.00
                                                                              

It is timeseries data for one day with one minute time interval which the fourth column indicates.第四列表示的是一天的时间序列数据,时间间隔为一分钟。 Unlikely to normal time series index, time index of this data look like 0 to 59, 100 to 159 ....2300 to 2359 because 1 day is 24 hours and 1 hour is 60 minutes.与正常的时间序列索引不同,该数据的时间索引看起来像 0 到 59、100 到 159 ....2300 到 2359,因为 1 天是 24 小时,1 小时是 60 分钟。 So, fill the gap with 'nan' value, I made the code as bellow:所以,用 'nan' 值填补空白,我将代码如下:

S = []
for i in range(0,24):

     s = np.arange(i*100,i*100+60)
     s = list(s)
S = S + s

pd.set_option('max_rows',10)
for INPUT in FileList:
     output = INPUT + "result" # set the output files
     data=pd.read_csv(INPUT,sep=',',index_col=[3],parse_dates=[3])
     index = 'S'#make the reference index to fill
     df = data
     sk_f = df.reindex(index)       
     sk_f.to_csv(output,na_rep='nan')

By this code, I purposed to fill the gap by the line of 'nan' following the indice in the fourth column based on S which is the reference index.通过这段代码,我打算在基于参考索引 S 的第四列中的索引之后的“nan”行来填补空白。 But the result is just the rows of 'nan' rather than filling the gap as below:但结果只是 'nan' 的行,而不是填补空白,如下所示:

,100,2007,241,22.471,-31.002,-999.0,-999.0.1,-999.0.2,-999.00,13.294,-999.00    .1
0,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan
1,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan 
2,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan
3,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan
4,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan
5,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan
6,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan
7,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan 
8,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan
9,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan
10,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan
11,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan

     

My expectation is to fill the gap of missing lines in the original data.我的期望是填补原始数据中缺失行的空白。 For example, in the original data, there is no lows between 0 to 3 index line.例如,在原始数据中,0 到 3 指数线之间没有低点。 So I would like to fill those lines with original data format.所以我想用原始数据格式填充这些行。 I may miss something.我可能会错过一些东西。

Firstly, I find problematic indent with create list S = S + s .首先,我发现 create list S = S + s缩进有问题。 You have to use, because list S kept only last s :你必须使用,因为 list S只保留 last s

S = []
for i in range(0,24):

     s = np.arange(i*100,i*100+60)
     s = list(s)
S = S + s #keep only last s

to:至:

S = []
for i in range(0,24):
    s = np.arange(i*100,i*100+60)
    s = list(s)
    S = S + s

or shorter:或更短:

S = []
for i in range(0,24):
    S = S + list(np.arange(i*100,i*100+60))

Next is problematic index = 'S' I think, it is typo and it can be index = S .接下来是有问题index = 'S'我认为,它是错字,它可以是index = S You can add function bfill() and fill gaps backward.您可以添加函数bfill()并向后填充空白。 link 关联

sk_f = df.reindex(index).bfill()

Code:代码:

import pandas as pd
import numpy as np
import io

S = []
for i in range(0,24):
    S = S + list(np.arange(i*100,i*100+60))

#original data
temp=u"""100,2007,239,4,29.588,-30.851,-999.0,-999.0,-999.0,-999.00,13.125,-999.00
100,2007,239,5,29.573,-30.843,-999.0,-999.0,-999.0,-999.00,13.126,-999.00
100,2007,239,14,29.389,-30.880,-999.0,-999.0,-999.0,-999.00,13.131,-999.00
100,2007,239,15,29.367,-30.901,-999.0,-999.0,-999.0,-999.00,13.131,-999.00
100,2007,239,24,29.374,-30.920,-999.0,-999.0,-999.0,-999.00,13.135,-999.00"""

#pd.set_option('max_rows',10)

data=pd.read_csv(io.StringIO(temp),sep=',', header=None, index_col=[3], parse_dates=[3])
data.index.name = None
print data

#     0     1    2       4       5    6    7    8    9       10   11
#4   100  2007  239  29.588 -30.851 -999 -999 -999 -999  13.125 -999
#5   100  2007  239  29.573 -30.843 -999 -999 -999 -999  13.126 -999
#14  100  2007  239  29.389 -30.880 -999 -999 -999 -999  13.131 -999
#15  100  2007  239  29.367 -30.901 -999 -999 -999 -999  13.131 -999
#24  100  2007  239  29.374 -30.920 -999 -999 -999 -999  13.135 -999

index = S #make the reference index to fill
df = data
sk_f = df.reindex(index).bfill()

print sk_f.head(20)
#     0     1    2       4       5    6    7    8    9       10   11
#0   100  2007  239  29.588 -30.851 -999 -999 -999 -999  13.125 -999
#1   100  2007  239  29.588 -30.851 -999 -999 -999 -999  13.125 -999
#2   100  2007  239  29.588 -30.851 -999 -999 -999 -999  13.125 -999
#3   100  2007  239  29.588 -30.851 -999 -999 -999 -999  13.125 -999
#4   100  2007  239  29.588 -30.851 -999 -999 -999 -999  13.125 -999
#5   100  2007  239  29.573 -30.843 -999 -999 -999 -999  13.126 -999
#6   100  2007  239  29.389 -30.880 -999 -999 -999 -999  13.131 -999
#7   100  2007  239  29.389 -30.880 -999 -999 -999 -999  13.131 -999
#8   100  2007  239  29.389 -30.880 -999 -999 -999 -999  13.131 -999
#9   100  2007  239  29.389 -30.880 -999 -999 -999 -999  13.131 -999
#10  100  2007  239  29.389 -30.880 -999 -999 -999 -999  13.131 -999
#11  100  2007  239  29.389 -30.880 -999 -999 -999 -999  13.131 -999
#12  100  2007  239  29.389 -30.880 -999 -999 -999 -999  13.131 -999
#13  100  2007  239  29.389 -30.880 -999 -999 -999 -999  13.131 -999
#14  100  2007  239  29.389 -30.880 -999 -999 -999 -999  13.131 -999
#15  100  2007  239  29.367 -30.901 -999 -999 -999 -999  13.131 -999
#16  100  2007  239  29.374 -30.920 -999 -999 -999 -999  13.135 -999
#17  100  2007  239  29.374 -30.920 -999 -999 -999 -999  13.135 -999
#18  100  2007  239  29.374 -30.920 -999 -999 -999 -999  13.135 -999
#19  100  2007  239  29.374 -30.920 -999 -999 -999 -999  13.135 -999

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM