简体   繁体   中英

How to train SVM model in sklearn python by input CSV file?

I have used sklearn scikit python for prediction. While importing following package

from sklearn import datasets and storing the result in iris = datasets.load_iris() , it works fine to train model

iris = pandas.read_csv("E:\scikit\sampleTestingCSVInput.csv") 
iris_header = ["Sepal_Length","Sepal_Width","Petal_Length","Petal_Width"] 

Model Algorithm :

model = SVC(gamma='scale')
model.fit(iris.data, iris.target_names[iris.target])

But while importing CSV file to train model , creating new array for target_names also , I am facing some error like

ValueError: Found input variables with inconsistent numbers of samples: [150, 4]

My CSV file has 5 Columns in which 4 columns are input and 1 column is output. Need to fit model for that output column.

How to provide argument for fit model?

Could anyone share the code sample to import CSV file to fit SVM model in sklearn python?

Since the question was not very clear to begin with and attempts to explain it were going in vain, I decided to download the dataset and do it for myself. So just to make sure we are working with the same dataset iris.head() will give you or something similar, a few names might be changed and a few values, but overall strucure will be the same. iris.head()

Now the first four columns are features and the fifth one is target/output.

Now you will need your X and Y as numpy arrays, to do that use

X = iris[ ['sepal length:','sepal Width:','petal length','petal width']].values
Y = iris[['Target']].values

Now since Y is categorical Data, You will need to one hot encode it using sklearn's LabelEncoder and scale the input X to do that use

label_encoder = LabelEncoder()
Y = label_encoder.fit_transform(Y)
X = StandardScaler().fit_transform(X)

To keep with the norm of separate train and test data, split the dataset using

X_train , X_test, y_train, y_test = train_test_split(X,Y)

Now just train it on your model using X_train and y_train

clf = SVC(C=1.0, kernel='rbf').fit(X_train,y_train)

After this you can use the test data to evaluate the model and tune the value of C as you wish.

Edit Just in case you don't know where the functions are here are the import statements

from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM