简体   繁体   中英

Fill in missing values based on series and populate second row based on previous or next row

I have a csv with 4 columns. The file contains some missing rows based on the series.

Input:-

No  A   B   C
1   10  50  12
3   40  50  12
4   20  60  15
6   80  80  18

Output:-

No  A   B   C
1   10  50  12
2   10  50  12
3   40  50  12
4   20  60  15
5   20  60  15
6   80  80  18

I need python and pandas code to generate the above output.

Use if No is column - create index by No and DataFrame.reindex by range with all possible values:

v = range(df['No'].min(), df['No'].max() + 1)
df1 = df.set_index('No').reindex(v, method='ffill').reset_index()
print (df1)
   No   A   B   C
0   1  10  50  12
1   2  10  50  12
2   3  40  50  12
3   4  20  60  15
4   5  20  60  15
5   6  80  80  18

Use if No is index solution is changed a bit:

v = range(df.index.min(), df.index.max() + 1)
df1 = df.reindex(v, method='ffill')
print (df1)
     A   B   C
No            
1   10  50  12
2   10  50  12
3   40  50  12
4   20  60  15
5   20  60  15
6   80  80  18

Create a dataframe of your missing rows

missing_list = [[i] + [pd.np.nan]*(df.shape[1] - 1) for i in range(df.No.min(), df.No.max()) if i not in df.No]
missing_df = pd.DataFrame(missing_list, columns=df.columns)

Concat to original dataframe, sort and forward fill

pd.concat([df, missing_df]).sort_values('No').ffill()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM