简体   繁体   中英

Pandas: Fill nan values in multiple columns with respective median values but accessing the columns using indices

I have a DataFrame with 15 columns and 5000 rows. In the DataFrame there are 4 columns that contain NaN values. I would like to replace the values with the median.

As there are several columns, I would like to do this via a for-loop. These are the column numbers: 1,5,8,9. The NaN values per column get the corresponding median.

I tried:

for i in [1,5,8,9]:
    df[i] = df[i].fillna(df[i].transform('median'))

No need for a loop, use a vectorial approach:

out = df.fillna(df.median())

Or, to limit to specific columns names:

cols = [1, 5, 8, 9]
# or automatic selection of columns with NaNs
# cols = df.isna().any()

out = df.fillna(df[cols].median())

or positional indices:

col_pos = [1, 5, 8, 9]
out = df.fillna(df.iloc[:, col_pos].median())

output:

   0    1  2    3    4  5  6    7    8  9
0  9  7.0  1  3.0  5.0  7  3  6.0  6.0  7
1  9  1.0  9  6.0  4.5  3  8  4.0  1.0  4
2  5  3.5  3  1.0  4.0  4  4  3.5  3.0  8
3  4  6.0  9  3.0  3.0  2  1  2.0  1.0  3
4  4  1.0  1  3.0  7.0  8  4  3.0  5.0  6

used example input:

   0    1  2    3    4  5  6    7    8  9
0  9  7.0  1  3.0  5.0  7  3  6.0  6.0  7
1  9  1.0  9  6.0  NaN  3  8  4.0  1.0  4
2  5  NaN  3  1.0  4.0  4  4  NaN  NaN  8
3  4  6.0  9  3.0  3.0  2  1  2.0  1.0  3
4  4  1.0  1  NaN  7.0  8  4  3.0  5.0  6

你可以简单地做:

df[[1,5,8,9]] = df[[1,5,8,9]].fillna(df[[1,5,8,9]].median())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM