How to improve speed of the program for large data in python

Question

I am trying to calculate the prediction probability. I have wrote a program which is calculating but speed is very slow and taking so much time for large dataset.

The aim is to calculate each prediction probability in the SVM model by using LinearSVC and OneVsRestClassifier but getting the error

AttributeError: 'LinearSVC' object has no attribute 'predict_proba'

Due to the above error, I have tried below

Code

from sklearn import svm

model_1 = svm.SVC(kernel='linear', probability=True)

from sklearn.preprocessing import LabelEncoder

X_1 = df["Property Address"]
lb = LabelEncoder()
X_2 = lb.fit_transform(X_1)

y_1 = df["Location_Name"]
y_2 = lb.fit_transform(y_1)

test_1 = test["Property Address"]
lb = LabelEncoder()
test_1 = lb.fit_transform(test_1)

X_2= X_2.reshape(-1, 1)
y_2= y_2.reshape(-1, 1)
test_1 = test_1.reshape(-1, 1)

model_1.fit(X_2, y_2)

results = model_1.predict_proba(test_1)[0]

# gets a dictionary of {'class_name': probability}
prob_per_class_dictionary = dict(zip(model.classes_, results))

Is there any other way for the same task? please suggest

Answer 1

You could use sklearns CalibratedClassifierCV if you need to use to the predict_proba method.

Or you could use Logistic Regression .

If your issue is related to speed, try consider using the LinearSVC in sklearn.svm instead of SVC(kernel='linear') . It is faster.

Answer 2

As suggested in another answer, LinearSVC is faster than SVC(kernel='linear') .

Regarding probability, SVC doesn't have predict_proba() . Instead, you have to set its probability hyperparameter to True . Link

Tip: SVM is preferred for small datasets, so prefer to use other algorithms to handle large datasets.

How to improve speed of the program for large data in python

Question

2 answers

solution1
2 ACCPTED 2018-10-30 12:26:07

solution2
0 2018-10-30 12:56:52

How to improve speed of the program for large data in python

Question

2 answers

solution1 2 ACCPTED 2018-10-30 12:26:07

solution2 0 2018-10-30 12:56:52

solution1
2 ACCPTED 2018-10-30 12:26:07

solution2
0 2018-10-30 12:56:52