简体   繁体   中英

Columns must be same length as key. Why I'm getting such error? can anyone help me out?

cols = ["Gender", "Married", "Education", "Self_Employed", "Property_Area", "Loan_Status", "Dependents"]
for col in cols:
    df[col] = pd.get_dummies(df[col], drop_first=True)

get_dummies generates DataFrame which may have many columns (even if you use drop_first=True ) and you need join() to add all new columns.

And later you can use del df[col] to remove old column.


Minimal example:

import pandas as pd

data = {
    'Gender':  ['Female','Male','Female'], 
    'Married': ['Yes','No','Yes'], 
}

df = pd.DataFrame(data)

print('\n--- before ---\n')
print(df)

for col in df.columns:
    result = pd.get_dummies(df[col])

    df = df.join(result)

    #del df[col]
    
print('\n--- after ---\n')
print(df)

Result:

--- before ---

   Gender Married
0  Female     Yes
1    Male      No
2  Female     Yes

--- after ---

   Gender Married  Female  Male  No  Yes
0  Female     Yes       1     0   0    1
1    Male      No       0     1   1    0
2  Female     Yes       1     0   0    1

Columns may have the same values - ie. Yes , No - so you may need to add prefixes to new columns.

    result = pd.get_dummies(df[col], prefix=col)

Eventually

    result = result.add_prefix(f"{col}_")

Minimal example:

import pandas as pd

data = {
    'Gender':  ['Female','Male','Female'], 
    'Married': ['Yes','No','Yes'],
    'Self_Employed': ['No','Yes','Yes'],
}

df = pd.DataFrame(data)

print('\n--- before ---\n')
print(df)

for col in df.columns:
    result = pd.get_dummies(df[col], prefix=col)
    #result = result.add_prefix(f"{col}_")

    df = df.join(result)

    #del df[col]

print('\n--- after ---\n')
print(df.to_string())

Result:

--- before ---

   Gender Married Self_Employed
0  Female     Yes            No
1    Male      No           Yes
2  Female     Yes           Yes

--- after ---

   Gender Married Self_Employed  Gender_Female  Gender_Male  Married_No  Married_Yes  Self_Employed_No  Self_Employed_Yes
0  Female     Yes            No              1            0           0            1                 1                  0
1    Male      No           Yes              0            1           1            0                 0                  1
2  Female     Yes           Yes              1            0           0            1                 0                  1

EDIT:

You can do the same witthout for -loop usin columns= in get_dummies()

all_results = pd.get_dummies(df, columns=df.columns) #, drop_first=True)
df = df.join(all_results)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM