Replacing dataframe values with NaN based on condition while preserving shape of df

Question

Python newbie here. I am sure I'm asking a trivial question but after doing a bit of google-foo I unfortunately haven't figured out a solution. So here it goes: If I have a dataframe such as this:

raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 
    'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze'], 
    'age': [42, 52, 36, 24, 73], 
    'preTestScore': [-999, -999, -999, 2, 1],
    'postTestScore': [2, 2, -999, 2, -999]}
df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age', 'preTestScore', 'postTestScore'])

How do I go about performing the following action: if preTestScore = -999, then replace both preTestScore and postTestScore with NaN?

I am able to replace a single column value with NaN by using df.replace(-999, np.nan) but this requires a conditional removal across two columns.

Thank you kindly

Answer 1

Use loc with boolen mask and list of columns for set to NaN :

df.loc[df['preTestScore'] == -999, ['preTestScore','postTestScore']] = np.nan
print (df)

      first_name last_name  age  preTestScore  postTestScore
0      Jason    Miller   42           NaN            NaN
1      Molly  Jacobson   52           NaN            NaN
2       Tina       Ali   36           NaN            NaN
3       Jake    Milner   24           2.0            2.0
4        Amy     Cooze   73           1.0         -999.0

Detail :

print (df['preTestScore'] == -999)
0     True
1     True
2     True
3    False
4    False
Name: preTestScore, dtype: bool

Answer 2

`pandas.DataFrame.mask`

in line

cols = ['preTestScore', 'postTestScore']
df.assign(**df[cols].mask(df[cols[0]].eq(-999)))

  first_name last_name  age  preTestScore  postTestScore
0      Jason    Miller   42           NaN            NaN
1      Molly  Jacobson   52           NaN            NaN
2       Tina       Ali   36           NaN            NaN
3       Jake    Milner   24           2.0            2.0
4        Amy     Cooze   73           1.0         -999.0

Explanation

I use cols to keep from having to write out the long column names. cols[0] is a short cut to writing 'preTestScore'
df[cols].mask(df[cols[0]].eq(-999)) will make both columns np.nan when preTestScore is -999 .
I use assign to produce a dataframe with the new columns without overwriting the old dataframe. If you want to persist this new dataframe, assign the results to a name. You can even use the old name, df = df.assign(**df[cols].mask(df[cols[0]].eq(-999)))
assign takes keyword arguments that you can pass by unpacking a dictionary with a double splat **kwargs . Conveniently, when using a data frame in a dictionary context, it unpacks with column names as the keywords and columns as the values, exactly as we want them.

In Place

cols = ['preTestScore', 'postTestScore']
df[cols] = df[cols].mask(df[cols[0]].eq(-999))
df

  first_name last_name  age  preTestScore  postTestScore
0      Jason    Miller   42           NaN            NaN
1      Molly  Jacobson   52           NaN            NaN
2       Tina       Ali   36           NaN            NaN
3       Jake    Milner   24           2.0            2.0
4        Amy     Cooze   73           1.0         -999.0

Replacing dataframe values with NaN based on condition while preserving shape of df

Question

2 answers

solution1
3 ACCPTED 2018-10-19 12:40:48

solution2
2 2018-10-19 12:42:41

`pandas.DataFrame.mask`

Explanation

In Place

Replacing dataframe values with NaN based on condition while preserving shape of df

Question

2 answers

solution1 3 ACCPTED 2018-10-19 12:40:48

solution2 2 2018-10-19 12:42:41

pandas.DataFrame.mask

Explanation

In Place

solution1
3 ACCPTED 2018-10-19 12:40:48

solution2
2 2018-10-19 12:42:41

`pandas.DataFrame.mask`