[英]pandas - multiple 'yes/no' dummy variables
我有一個包含多個類別變量的數據框,需要將其轉換為虛擬變量。 性別和區域(4種類型)通過pd.get_dummies
可以輕松pd.get_dummies
。 但是,此后我有幾個變量yes/no
。 我該怎么做,以便啞元yes
和no
列包含變量名稱? 例如,“結婚”變量會變成married_yes
和married_no
?
這是我當前的代碼和前五行的屏幕截圖:
genderdummy=pd.get_dummies(bank_df['gender'])
regiondummy=pd.get_dummies(bank_df['region'])
marrieddummy=pd.get_dummies(bank_df['married'])
cardummy=pd.get_dummies(bank_df['car'])
savingsdummy=pd.get_dummies(bank_df['savings_acct'])
currentdummy=pd.get_dummies(bank_df['current_acct'])
mortgagedummy=pd.get_dummies(bank_df['mortgage'])
pepdummy=pd.get_dummies(bank_df['pep'])
newdata_df=pd.concat([genderdummy,regiondummy,marrieddummy,cardummy,savingsdummy,currentdummy,mortgagedummy,pepdummy], axis=1)
newdata_df.head()
因此,根據建議,這是我現在擁有的:
## HW Part 6: Converting Categorical Variables and Exporting Data
genderdummy=pd.get_dummies(bank_df['gender'])
regiondummy=pd.get_dummies(bank_df['region'])
dummy_vars = [bank_df('married'), bank_df('car'),bank_df('savings_acct'),bank_df('current_acct'),bank_df('mortgage'),bank_df('pep')]
pd.get_dummies(bank_df[dummy_vars])
newdata_df=pd.concat([genderdummy,regiondummy,dummy_vars], axis=1)
newdata_df.head()
如果您更改方法,它將自動執行此操作。 您只需要在數據幀而不是序列上調用pd.get_dummies
:
import numpy as np
import pandas as pd
# Define sample data and columns for dummy variables
df = pd.DataFrame(np.random.choice(['yes', 'no'], size=(6, 3)), columns=['gender', 'region', 'married'])
dummy_vars = ['gender', 'married']
# Create dummy variables
pd.get_dummies(df[dummy_vars])
gender_no gender_yes married_no married_yes
0 0 1 1 0
1 1 0 0 1
2 0 1 1 0
3 1 0 1 0
4 1 0 1 0
5 0 1 1 0
或者,您可以使用prefix
參數來明確指定:
pd.get_dummies(df[dummy_vars], prefix=dummy_vars)
更新:
使用變量,它應如下所示:
genderdummy = pd.get_dummies(bank_df['gender'])
regiondummy = pd.get_dummies(bank_df['region'])
dummy_vars = ['married', 'car', 'savings_acct', 'current_acct', 'mortgage', 'pep']
other_dummies = pd.get_dummies(bank_df[dummy_vars])
newdata_df = pd.concat([genderdummy, regiondummy, other_dummies], axis=1)
newdata_df.head()
公告dummy_vars
在你列的只是名字bank_df
。
在pandas.get_dummies()中使用prefix
參數
df = pd.DataFrame({'text':['cat', 'dog','cat','dog']})
df = pd.get_dummies(df['text'], prefix='text')
print(df)
產量
text_cat text_dog
0 1 0
1 0 1
2 1 0
3 0 1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.