简体   繁体   中英

how to encoding several column (but not all column) in dataframe python using pandas

I want to build a naive bayes model using two dataframes (test dataframe, train dataframe)

The dataframe contains 13 columns, but I just want to encode the dataframe from str to int value in just 5-6 columns. How can I do that with one code so that 6 columns can directly be encoded, I follow this answer:

https://stackoverflow.com/a/37159615/12977554

import pandas as pd
from sklearn.preprocessing import LabelEncoder

    df = pd.DataFrame({
    'colors':  ["R" ,"G", "B" ,"B" ,"G" ,"R" ,"B" ,"G" ,"G" ,"R" ,"G" ],
    'skills':  ["Java" , "C++", "SQL", "Java", "Python", "Python", "SQL","C++", "Java", "SQL", "Java"]
    })
    
    def encode_df(dataframe):
        le = LabelEncoder()
        for column in dataframe.columns:
            dataframe[column] = le.fit_transform(dataframe[column])
        return dataframe
    
    #encode the dataframe
    encode_df(df)

but it just only encodes 1 column, instead what I want is 6 columns with 1 code.

You can loop through the columns and fit_transform

cols = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6']

for col in cols:
    le = LabelEncoder()
    df[col] = le.fit_transform(df[col].astype('str'))
    
df

Ideally you want to use same trasnfomer for both train and test dataset
For that you need to use

for col in cols:
    le = LabelEncoder()
    le.fit(df_train[col].astype('str'))
    df_train[col] = le.transform(df_train[col].astype('str'))
    df_test[col] = le.transform(df_test[col].astype('str'))
        
df

Have you tried apply(), yet?

 le = LabelEncoder()
 df['colors'] = df['colors'].apply(lambda x: le.fit_transform(x))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM