[英]Columns must be same length as key. Why I'm getting such error? can anyone help me out?
cols = ["Gender", "Married", "Education", "Self_Employed", "Property_Area", "Loan_Status", "Dependents"]
for col in cols:
df[col] = pd.get_dummies(df[col], drop_first=True)
get_dummies
生成DataFrame
可能有很多列(即使您使用drop_first=True
)並且您需要join()
來添加所有新列。
稍后您可以使用del df[col]
刪除舊列。
最小的例子:
import pandas as pd
data = {
'Gender': ['Female','Male','Female'],
'Married': ['Yes','No','Yes'],
}
df = pd.DataFrame(data)
print('\n--- before ---\n')
print(df)
for col in df.columns:
result = pd.get_dummies(df[col])
df = df.join(result)
#del df[col]
print('\n--- after ---\n')
print(df)
結果:
--- before ---
Gender Married
0 Female Yes
1 Male No
2 Female Yes
--- after ---
Gender Married Female Male No Yes
0 Female Yes 1 0 0 1
1 Male No 0 1 1 0
2 Female Yes 1 0 0 1
列可能具有相同的值 - 即。 Yes
, No
- 因此您可能需要為新列添加前綴。
result = pd.get_dummies(df[col], prefix=col)
最終
result = result.add_prefix(f"{col}_")
最小的例子:
import pandas as pd
data = {
'Gender': ['Female','Male','Female'],
'Married': ['Yes','No','Yes'],
'Self_Employed': ['No','Yes','Yes'],
}
df = pd.DataFrame(data)
print('\n--- before ---\n')
print(df)
for col in df.columns:
result = pd.get_dummies(df[col], prefix=col)
#result = result.add_prefix(f"{col}_")
df = df.join(result)
#del df[col]
print('\n--- after ---\n')
print(df.to_string())
結果:
--- before ---
Gender Married Self_Employed
0 Female Yes No
1 Male No Yes
2 Female Yes Yes
--- after ---
Gender Married Self_Employed Gender_Female Gender_Male Married_No Married_Yes Self_Employed_No Self_Employed_Yes
0 Female Yes No 1 0 0 1 1 0
1 Male No Yes 0 1 1 0 0 1
2 Female Yes Yes 1 0 0 1 0 1
編輯:
您可以在get_dummies()
中for
-loop 使用columns=
執行相同的操作
all_results = pd.get_dummies(df, columns=df.columns) #, drop_first=True)
df = df.join(all_results)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.