I have my training data separated into a few folders by category. So folder1 is class1 with 50+ files of data on that category etc. I read in my data and I'm confused on how to format it for the svm model to have the right shapes and everything.
Summary of what I do: Read in the data Iterate over each folder and each file in each folder and add each data frame to a list. For every file I iterate through I add what it was classified as to another list as an integer.
Obviously this doesn't get a single dataframe to use in the model so how do I format this or should I do something else?
Possible steps required to fit a support vector model
As I understand it, you already separated your data into different data frames?
That is not necessary for.sklearn. An example of how to solve.sklearn.SVM without splitting into target variables (pseudo code)
#PseudoCode
data = data.drop("classification")
classifcation = data["classifcation"]
From offical documentary:
clf = svm.SVC()
clf.fit(data, classifcation)
I assume that this is obviously a classification problem.
I think it is better to keep the data in one dataframe. For example, you can follow the steps below.
That means first you should bring your categorical target variable into a numeric value. For example with Hot encoding? (There are also other possibilities). More information:
https://medium.com/analytics-vidhya/target-encoding-vs-one-hot-encoding-with-simple-examples-276a7e7b3e64;
https://towardsdatascience.com/multiclass-classification-with-support-vector-machines-svm-kernel-trick-kernel-functions-f9d5377d6f02
If the variables are recorded in different metrics, scale the corresponding variables, eg 0-1 (Min-Max Scaling).
#Comment: Depending on the dataset, further steps may be necessary.
Fit the Support Vector Algorithm More information in the official documentation:
https://scikit-learn.org/stable/modules/svm.html
Then you can validate and test the information with Sklearn (eg Grid Search), and CV)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.