pandas DataFrame: replace values in multiple columns with the value from another

Question

I've got a pandas DataFrame where I want to replace certain values in a selection of columns with the value from another in the same row.

I did the following:

df[cols[23:30]] = df[cols[23:30]].apply(lambda x: x.replace(99, df['col1']))
df[cols[30:36]] = df[cols[30:36]].apply(lambda x: x.replace(99, df['col2']))

cols is a list with column names.
99 is considered a missing value which I want to replace with the (already calculated) Mean for the given class (ie, col1 or col2 depending on the selection)

It works, but time it takes to replace all those values seems to take longer than would be necessary. I figured there must be a quicker (computationally) way of achieving the same.

Any suggestions?

Answer 1

You can try:

import numpy as np

df[cols[23:30]] = np.where(df[cols[23:30]] == 99, df[['col1'] * (30-23)], df[cols[23:30]])

df[cols[30:36]] = np.where(df[cols[30:36]] == 99, df[['col2'] * (36-30)], df[cols[30:36]])

df[["col1"] * n] will create dataframe with exactly same column repeated n times, so numpy could use it as a mask for n columns you want to iterate through if 99 is encountered, otherwise taking respective value, which is already there.

pandas DataFrame: replace values in multiple columns with the value from another

Question

1 answers

solution1
0 ACCPTED 2019-11-11 11:55:28

pandas DataFrame: replace values in multiple columns with the value from another

Question

1 answers

solution1 0 ACCPTED 2019-11-11 11:55:28

solution1
0 ACCPTED 2019-11-11 11:55:28