Use get_dummies in categorical data

Question

I have to use a dataset then use decision tree classifier, for that I can't have categorical data, but this dataset has columns with categorical data like this:

I know it can be done by using get_dummies function but I couldn't do it. I've firstly read the dataset like this:

def load_data(fname):
    """Load CSV file"""
    df = pd.read_csv(fname)
    nc = df.shape[1]
    matrix = df.values
    table_X = matrix [:, 2:]
    table_y = matrix [:, 81]
    features_names = df.columns.values[1:]
    target = df.columns.values[81]
    return table_X, table_y

table_X, table_y = load_data("dataset.csv")

pd.get_dummies(table_X)

when I run this I get this exception: Exception: Data must be 1-dimensional

What am I doing wrong?

------------------------------- EDIT ------------------------------------

from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
y = le.fit_transform(table_y)
le.classes_

le.transform(['<200000', '>400000', '[200000,400000]'])

To apply the decision tree algorithm:

from sklearn import tree

dtc_Gini = tree.DecisionTreeClassifier() #criterion='gini'
dtc_Gini1 = dtc_Gini.fit(table_X, y)

ValueError: could not convert string to float: 'RL'

Answer 1

just after pd.read_csv use pd.get_dummies(df)

Answer 2

Based on this answer: get_dummies(), Exception: Data must be 1-dimensional It seems like you have to convert back to dataframe your table_X before apply the function get_dummies() . Or you can avoid to use df.values .

Try this:

def load_data(fname):
    """Load CSV file"""
    df = pd.read_csv(fname)
    table_X = df.iloc[:, 2:]
    table_y = df.iloc[:, 81]
    return table_X, table_y

table_X, table_y = load_data("dataset.csv")

pd.get_dummies(table_X)

And let me know if it works.

Use get_dummies in categorical data

Question

2 answers

solution1
0 2020-06-11 14:46:24

solution2
0 ACCPTED 2020-06-11 15:28:02

Use get_dummies in categorical data

Question

2 answers

solution1 0 2020-06-11 14:46:24

solution2 0 ACCPTED 2020-06-11 15:28:02

solution1
0 2020-06-11 14:46:24

solution2
0 ACCPTED 2020-06-11 15:28:02