简体   繁体   中英

How to format data for a multiclass svm model in sklearn

I have my training data separated into a few folders by category. So folder1 is class1 with 50+ files of data on that category etc. I read in my data and I'm confused on how to format it for the svm model to have the right shapes and everything.

Summary of what I do: Read in the data Iterate over each folder and each file in each folder and add each data frame to a list. For every file I iterate through I add what it was classified as to another list as an integer.

Obviously this doesn't get a single dataframe to use in the model so how do I format this or should I do something else?

Possible steps required to fit a support vector model

As I understand it, you already separated your data into different data frames?

That is not necessary for.sklearn. An example of how to solve.sklearn.SVM without splitting into target variables (pseudo code)

#PseudoCode

data = data.drop("classification")
classifcation = data["classifcation"]

From offical documentary:

clf = svm.SVC()
clf.fit(data, classifcation)

I assume that this is obviously a classification problem.

I think it is better to keep the data in one dataframe. For example, you can follow the steps below.

  1. That means first you should bring your categorical target variable into a numeric value. For example with Hot encoding? (There are also other possibilities). More information:
    https://medium.com/analytics-vidhya/target-encoding-vs-one-hot-encoding-with-simple-examples-276a7e7b3e64;
    https://towardsdatascience.com/multiclass-classification-with-support-vector-machines-svm-kernel-trick-kernel-functions-f9d5377d6f02

  2. If the variables are recorded in different metrics, scale the corresponding variables, eg 0-1 (Min-Max Scaling).

#Comment: Depending on the dataset, further steps may be necessary.

  1. Fit the Support Vector Algorithm More information in the official documentation:
    https://scikit-learn.org/stable/modules/svm.html

  2. Then you can validate and test the information with Sklearn (eg Grid Search), and CV)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM