简体   繁体   中英

How to improve speed of the program for large data in python

I am trying to calculate the prediction probability. I have wrote a program which is calculating but speed is very slow and taking so much time for large dataset.

The aim is to calculate each prediction probability in the SVM model by using LinearSVC and OneVsRestClassifier but getting the error

AttributeError: 'LinearSVC' object has no attribute 'predict_proba'

Due to the above error, I have tried below

Code

from sklearn import svm

model_1 = svm.SVC(kernel='linear', probability=True)

from sklearn.preprocessing import LabelEncoder

X_1 = df["Property Address"]
lb = LabelEncoder()
X_2 = lb.fit_transform(X_1)

y_1 = df["Location_Name"]
y_2 = lb.fit_transform(y_1)

test_1 = test["Property Address"]
lb = LabelEncoder()
test_1 = lb.fit_transform(test_1)

X_2= X_2.reshape(-1, 1)
y_2= y_2.reshape(-1, 1)
test_1 = test_1.reshape(-1, 1)

model_1.fit(X_2, y_2)

results = model_1.predict_proba(test_1)[0]

# gets a dictionary of {'class_name': probability}
prob_per_class_dictionary = dict(zip(model.classes_, results))

Is there any other way for the same task? please suggest

You could use sklearns CalibratedClassifierCV if you need to use to the predict_proba method.

Or you could use Logistic Regression .

If your issue is related to speed, try consider using the LinearSVC in sklearn.svm instead of SVC(kernel='linear') . It is faster.

As suggested in another answer, LinearSVC is faster than SVC(kernel='linear') .

Regarding probability, SVC doesn't have predict_proba() . Instead, you have to set its probability hyperparameter to True . Link

Tip: SVM is preferred for small datasets, so prefer to use other algorithms to handle large datasets.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM