I have a dataframe with NaN. I have to remove nan at the starting rows only, and wants to keeps NaN after real number starts:
suppose: my data frame is something like:
a = pd.DataFrame({'data':[np.nan,np.nan,np.nan,np.nan,4,5,6,2,np.nan,1,3,4,5,np.nan,4,5,np.nan,np.nan]})
a=
data
0 NaN
1 NaN
2 NaN
3 NaN
4 4.0
5 5.0
6 6.0
7 2.0
8 NaN
9 1.0
10 3.0
11 4.0
12 5.0
13 NaN
14 4.0
15 5.0
16 NaN
17 NaN
and I tried to remove NaN at the beginning and wants data-frame like this:
data
4 4.0
5 5.0
6 6.0
7 2.0
8 NaN
9 1.0
10 3.0
11 4.0
12 5.0
13 NaN
14 4.0
15 5.0
16 NaN
17 NaN
I tried to use this function but not working. Any help will be highly appreciated.
for w in np.arange(len(a)):
if a.iloc[w] == np.nan:
a.drop(a.index[w])
Get the first valid index and slice
idx = a.first_valid_index()
a.loc[idx:]
data
4 4.0
5 5.0
6 6.0
7 2.0
8 NaN
9 1.0
10 3.0
11 4.0
12 5.0
13 NaN
14 4.0
15 5.0
16 NaN
17 NaN
try something like this:
start = a[a.data.notnull()].index[0]
new_df = a.loc[start:]
the first line finds the index of the first non-null value, the second cuts out all the entries before that from your dataframe.
Instead of removing the "bad" rows, you can locate and preserve the "good" rows:
b = a[a.data.fillna(method='ffill').notnull()]
# data
#4 4.0
#5 5.0
#6 6.0
#7 2.0
#8 NaN
#9 1.0
Ummm , you should using first_valid_index()
, but here is another way :-)
a.loc[a.data.notnull().nonzero()[0][0]:]
Out[1276]:
data
4 4.0
5 5.0
6 6.0
7 2.0
8 NaN
9 1.0
10 3.0
11 4.0
12 5.0
13 NaN
14 4.0
15 5.0
16 NaN
17 NaN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.