简体   繁体   中英

How to fill missing values using pandas?

I'm trying to fill missing values with another array which is predicted by a regressor. I don't know how to replace the missing values with corresponding values in that array.

For example, I have:

[0, 1, 2, NaN, NaN] 

and

[0, 0, 1, 2, 3]

How can I replace these NaN with 2 and 3? It seems that fillna can't do this.

Sorry for having asked an ambiguous question.

First you have to clearly identify the meaning of missing values (NaN, string, integer and even 0 can be represented as a missing value depending on your dataset)

The easiest way to do so if you have NaN value would be the following, you can always convert your missing value to nan by using replace as well.

# let df be your dataframe and x be the value you want to fill it with
df.fillna(x)

The second way would be imputing values using a library from sklearn. I have add a simple code for using the impute function assuming your missing values are 'NaN' and the method you want to fill the data with is with the mean of the column.

from sklearn.impute import SimpleImputer
df = SimpleImputer(missing_value = np.nan, strategy = 'mean').fit_transform(df)

You can change the strategy to different method such as mean of the column, or the median or the column. It all depends on what work best for you

Suppose there are 2 arrays:

arr1 = pd.DataFrame([0, 1, 2, np.NaN, np.NaN])
arr2 = pd.DataFrame([0, 0, 1, 2, 3])

You can replace NaN of arr1 with the corresponding element of arr2 via fillna :

arr1.fillna(arr2, inplace=True)

This is the result after executing fillna :

arr1 = [0, 1, 2, 2, 3]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM