简体   繁体   中英

Use get_dummies on numerical data

I need to create a new variable from the column that i have called 'baths', and I want to make it so that those with observations of 1 bath correspond to a value of 0, and those with more than 1 bath correspond to a 1. How would I do this in python? The baths column has 932 rows which range from 1 to 5 by increments of.5

I tried to use pd.get_dummies on the column but it returned:

baths_1.0 baths_1.5 baths_2.0 baths_2.5 baths_3.0 baths_3.5 baths_4.0 baths_4.5 baths_5.0

I just want one column returned. New to this so any help is great thanks.

Here is my code:

sac = pd.read_csv('sacramento.csv')
df = pd.get_dummies(sac,columns= ['baths'])
df

data sample:

        city    zip   beds  baths   sqft    type    price   latitude
1   SACRAMENTO  z95838  2   1.0     836 Residential 59222   38.631913   
2   SACRAMENTO  z95823  3   2.0     1167Residential 68212   38.478902   
3   SACRAMENTO  z95815  2   1.0     796 Residential 68880   38.618305   
4   SACRAMENTO  z95815  2   3.0     852 Residential 69307   38.616835   
5   SACRAMENTO  z95824  2   2.0     797 Residential 81900   38.519470   

Using get_dummies will turn each unique value into its own category (which will yield an unwanted result). What you should do is select a threshold to transform a numeric column to a binary column.

There are multiple ways of doing this, including DataFrame.loc , but this can be done in one line using numpy.where or any other case -like function.

import numpy as np
df['baths_dummy'] = np.where(df['baths'] <= 1, 0, 1)

Please note: you may need to be more specific if you have NaN values in df['baths'] .

No dummies needed. Just apply a lambda fn:

df['baths'] = df['baths'].apply(lambda x: 0 if (x = 0) else 1) 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM