Drop leading rows of Pandas Dataframe containing NaN

Question

Prior to cleaning up a Pandas Dataframe of a time series, I want to drop the rows at the top that contain NaN in certain columns.

I wanted to iterate over the start of the dataframe and drop the rows that meet the condition where column is NaN . My dataframe below is called "train", and contains two columns - 'Date', and 'Maximum temperature (Degree C)'. I set Date to the Index. The initial 20 odd rows contain NaN in 'Maximum temperature (Degree C)'.

#Drop NaN values at start of dataframe

for date,row in train.iterrows():
  print(date)
  if train.loc[date,'Maximum temperature (Degree C)']==np.nan:
      train.drop(index=date, inplace=True)
  else:
    break

I expected the code would drop the rows from the start of the dataframe, but my if statement doesn't pick up the NaN , so it breaks after the first row.

Answer 1

Instead of dropping you can use, you can use first_valid_index() as follows

import pandas as pd 

# dataframe  
df = pd.DataFrame({"A":[None, None, 2, 4, 5], 
                   "B":[None, None, None, 44, 2], 
                   "C":[None, None, None, 1, 5]}) 


df.C.first_valid_index()

Output:

then use

df.loc[2:]

OR

df[df.C.first_valid_index():]

Output:

     A    B    C
3   4.0  44.0  1.0
4   5.0  2.0   5.0

Hope this helps

Drop leading rows of Pandas Dataframe containing NaN

Question

1 answers

solution1
1 2019-08-31 07:54:57

Drop leading rows of Pandas Dataframe containing NaN

Question

1 answers

solution1 1 2019-08-31 07:54:57

solution1
1 2019-08-31 07:54:57