[英]Columns must be same length as key. Why I'm getting such error? can anyone help me out?
cols = ["Gender", "Married", "Education", "Self_Employed", "Property_Area", "Loan_Status", "Dependents"]
for col in cols:
df[col] = pd.get_dummies(df[col], drop_first=True)
get_dummies
generates DataFrame
which may have many columns (even if you use drop_first=True
) and you need join()
to add all new columns. get_dummies
生成DataFrame
可能有很多列(即使您使用drop_first=True
)并且您需要join()
来添加所有新列。
And later you can use del df[col]
to remove old column.稍后您可以使用del df[col]
删除旧列。
Minimal example:最小的例子:
import pandas as pd
data = {
'Gender': ['Female','Male','Female'],
'Married': ['Yes','No','Yes'],
}
df = pd.DataFrame(data)
print('\n--- before ---\n')
print(df)
for col in df.columns:
result = pd.get_dummies(df[col])
df = df.join(result)
#del df[col]
print('\n--- after ---\n')
print(df)
Result:结果:
--- before ---
Gender Married
0 Female Yes
1 Male No
2 Female Yes
--- after ---
Gender Married Female Male No Yes
0 Female Yes 1 0 0 1
1 Male No 0 1 1 0
2 Female Yes 1 0 0 1
Columns may have the same values - ie.列可能具有相同的值 - 即。 Yes
, No
- so you may need to add prefixes to new columns. Yes
, No
- 因此您可能需要为新列添加前缀。
result = pd.get_dummies(df[col], prefix=col)
Eventually最终
result = result.add_prefix(f"{col}_")
Minimal example:最小的例子:
import pandas as pd
data = {
'Gender': ['Female','Male','Female'],
'Married': ['Yes','No','Yes'],
'Self_Employed': ['No','Yes','Yes'],
}
df = pd.DataFrame(data)
print('\n--- before ---\n')
print(df)
for col in df.columns:
result = pd.get_dummies(df[col], prefix=col)
#result = result.add_prefix(f"{col}_")
df = df.join(result)
#del df[col]
print('\n--- after ---\n')
print(df.to_string())
Result:结果:
--- before ---
Gender Married Self_Employed
0 Female Yes No
1 Male No Yes
2 Female Yes Yes
--- after ---
Gender Married Self_Employed Gender_Female Gender_Male Married_No Married_Yes Self_Employed_No Self_Employed_Yes
0 Female Yes No 1 0 0 1 1 0
1 Male No Yes 0 1 1 0 0 1
2 Female Yes Yes 1 0 0 1 0 1
EDIT:编辑:
You can do the same witthout for
-loop usin columns=
in get_dummies()
您可以在get_dummies()
中for
-loop 使用columns=
执行相同的操作
all_results = pd.get_dummies(df, columns=df.columns) #, drop_first=True)
df = df.join(all_results)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.