I have a data frame with many null records:
Col_1 Col_2 Col_3
10 5 2
22 7 7
3 9 5
4 NaN NaN
5 NaN NaN
6 4 NaN
7 6 7
8 10 NaN
12 NaN 1
I want to remove all NaN values in all rows of columns . As you could see, each column has different number of rows. So, I want to get something like this:
Col_1 Col_2 Col_3
10 5 2
22 7 7
3 9 5
4 4 7
6 6 1
7 10
8
12
I tried
filtered_df = df.dropna(how='any')
But it removes all records in the dataframe. How may I do that ?
Using Divakar's justify
function—
df[:] = justify(df.values, invalid_val=np.nan, axis=0, side='up')
df = df.fillna('')
print(df)
Col_1 Col_2 Col_3
0 10.0 5 2
1 22.0 7 7
2 3.0 9 5
3 4.0 4 7
4 5.0 6 1
5 6.0 10
6 7.0
7 8.0
8 12.0
As you could see, each column has different number of rows.
A DataFrame is a tabular data structure: you can look up an index and a column, and find the value. If the number of rows is different per columns, then the index is meaningless and misleading. A dict
might be a better alternative:
{c: df[c].dropna().values for c in df.columns}
or
{c: list(df[c]) for c in df.columns}
You can also use pd.concat
on a list of series.
Note that columns Col_2
and Col_3
are unavoidably float
due to NaN
elements, if you remove dtype=object
as an option.
res = pd.concat([df[x].dropna().reset_index(drop=True) for x in df], axis=1)
print(res)
Col_1 Col_2 Col_3
0 10 5.0 2.0
1 22 7.0 7.0
2 3 9.0 5.0
3 4 4.0 7.0
4 5 6.0 1.0
5 6 10.0 NaN
6 7 NaN NaN
7 8 NaN NaN
8 12 NaN NaN
你也可以试试这个
censos_data.dropna(subset=censos_data.columns,inplace=True)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.