I need to create a new variable from the column that i have called 'baths', and I want to make it so that those with observations of 1 bath correspond to a value of 0, and those with more than 1 bath correspond to a 1. How would I do this in python? The baths column has 932 rows which range from 1 to 5 by increments of.5
I tried to use pd.get_dummies on the column but it returned:
baths_1.0 baths_1.5 baths_2.0 baths_2.5 baths_3.0 baths_3.5 baths_4.0 baths_4.5 baths_5.0
I just want one column returned. New to this so any help is great thanks.
Here is my code:
sac = pd.read_csv('sacramento.csv')
df = pd.get_dummies(sac,columns= ['baths'])
df
data sample:
city zip beds baths sqft type price latitude
1 SACRAMENTO z95838 2 1.0 836 Residential 59222 38.631913
2 SACRAMENTO z95823 3 2.0 1167Residential 68212 38.478902
3 SACRAMENTO z95815 2 1.0 796 Residential 68880 38.618305
4 SACRAMENTO z95815 2 3.0 852 Residential 69307 38.616835
5 SACRAMENTO z95824 2 2.0 797 Residential 81900 38.519470
Using get_dummies
will turn each unique value into its own category (which will yield an unwanted result). What you should do is select a threshold to transform a numeric column to a binary column.
There are multiple ways of doing this, including DataFrame.loc
, but this can be done in one line using numpy.where
or any other case
-like function.
import numpy as np
df['baths_dummy'] = np.where(df['baths'] <= 1, 0, 1)
Please note: you may need to be more specific if you have NaN
values in df['baths']
.
No dummies needed. Just apply a lambda fn:
df['baths'] = df['baths'].apply(lambda x: 0 if (x = 0) else 1)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.