Python newbie here. I am sure I'm asking a trivial question but after doing a bit of google-foo I unfortunately haven't figured out a solution. So here it goes: If I have a dataframe such as this:
raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze'],
'age': [42, 52, 36, 24, 73],
'preTestScore': [-999, -999, -999, 2, 1],
'postTestScore': [2, 2, -999, 2, -999]}
df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age', 'preTestScore', 'postTestScore'])
How do I go about performing the following action: if preTestScore = -999, then replace both preTestScore and postTestScore with NaN?
I am able to replace a single column value with NaN by using df.replace(-999, np.nan) but this requires a conditional removal across two columns.
Thank you kindly
Use loc
with boolen mask and list of columns for set to NaN
:
df.loc[df['preTestScore'] == -999, ['preTestScore','postTestScore']] = np.nan
print (df)
first_name last_name age preTestScore postTestScore
0 Jason Miller 42 NaN NaN
1 Molly Jacobson 52 NaN NaN
2 Tina Ali 36 NaN NaN
3 Jake Milner 24 2.0 2.0
4 Amy Cooze 73 1.0 -999.0
Detail :
print (df['preTestScore'] == -999)
0 True
1 True
2 True
3 False
4 False
Name: preTestScore, dtype: bool
pandas.DataFrame.mask
in line
cols = ['preTestScore', 'postTestScore']
df.assign(**df[cols].mask(df[cols[0]].eq(-999)))
first_name last_name age preTestScore postTestScore
0 Jason Miller 42 NaN NaN
1 Molly Jacobson 52 NaN NaN
2 Tina Ali 36 NaN NaN
3 Jake Milner 24 2.0 2.0
4 Amy Cooze 73 1.0 -999.0
I use cols
to keep from having to write out the long column names. cols[0]
is a short cut to writing 'preTestScore'
df[cols].mask(df[cols[0]].eq(-999))
will make both columns np.nan
when preTestScore
is -999
.
I use assign
to produce a dataframe with the new columns without overwriting the old dataframe. If you want to persist this new dataframe, assign the results to a name. You can even use the old name, df = df.assign(**df[cols].mask(df[cols[0]].eq(-999)))
assign
takes keyword arguments that you can pass by unpacking a dictionary with a double splat **kwargs
. Conveniently, when using a data frame in a dictionary context, it unpacks with column names as the keywords and columns as the values, exactly as we want them.
cols = ['preTestScore', 'postTestScore']
df[cols] = df[cols].mask(df[cols[0]].eq(-999))
df
first_name last_name age preTestScore postTestScore
0 Jason Miller 42 NaN NaN
1 Molly Jacobson 52 NaN NaN
2 Tina Ali 36 NaN NaN
3 Jake Milner 24 2.0 2.0
4 Amy Cooze 73 1.0 -999.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.