Prior to cleaning up a Pandas Dataframe of a time series, I want to drop the rows at the top that contain NaN
in certain columns.
I wanted to iterate over the start of the dataframe and drop the rows that meet the condition where column is NaN
. My dataframe below is called "train", and contains two columns - 'Date', and 'Maximum temperature (Degree C)'. I set Date to the Index. The initial 20 odd rows contain NaN
in 'Maximum temperature (Degree C)'.
#Drop NaN values at start of dataframe
for date,row in train.iterrows():
print(date)
if train.loc[date,'Maximum temperature (Degree C)']==np.nan:
train.drop(index=date, inplace=True)
else:
break
I expected the code would drop the rows from the start of the dataframe, but my if
statement doesn't pick up the NaN
, so it breaks after the first row.
Instead of dropping you can use, you can use first_valid_index()
as follows
import pandas as pd
# dataframe
df = pd.DataFrame({"A":[None, None, 2, 4, 5],
"B":[None, None, None, 44, 2],
"C":[None, None, None, 1, 5]})
df.C.first_valid_index()
Output:
3
then use
df.loc[2:]
OR
df[df.C.first_valid_index():]
Output:
A B C
3 4.0 44.0 1.0
4 5.0 2.0 5.0
Hope this helps
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.