如何提高python中大数据程序的速度

Question

我正在尝试计算预测概率。 我编写了一个正在计算的程序，但速度非常慢，并且为大型数据集花费了太多时间。

目的是通过使用LinearSVC和OneVsRestClassifier计算 SVM 模型中的每个预测概率，但得到误差

AttributeError: 'LinearSVC' object has no attribute 'predict_proba'

由于上述错误，我尝试了以下

代码

from sklearn import svm

model_1 = svm.SVC(kernel='linear', probability=True)

from sklearn.preprocessing import LabelEncoder

X_1 = df["Property Address"]
lb = LabelEncoder()
X_2 = lb.fit_transform(X_1)

y_1 = df["Location_Name"]
y_2 = lb.fit_transform(y_1)

test_1 = test["Property Address"]
lb = LabelEncoder()
test_1 = lb.fit_transform(test_1)

X_2= X_2.reshape(-1, 1)
y_2= y_2.reshape(-1, 1)
test_1 = test_1.reshape(-1, 1)

model_1.fit(X_2, y_2)

results = model_1.predict_proba(test_1)[0]

# gets a dictionary of {'class_name': probability}
prob_per_class_dictionary = dict(zip(model.classes_, results))

有没有其他方法可以完成相同的任务？ 请建议

Answer 1

如果您需要使用predict_proba方法，您可以使用 sklearns CalibratedClassifierCV 。

或者你可以使用Logistic Regression 。

如果您的问题与速度有关，请尝试考虑在LinearSVC中使用sklearn.svm而不是SVC(kernel='linear') 。 它更快。

Answer 2

正如另一个答案中所建议的， LinearSVC比SVC(kernel='linear')快。

关于概率，SVC 没有predict_proba() 。 相反，您必须将其probability超参数设置为True 。 关联

提示： SVM 更适合小数据集，所以更喜欢使用其他算法来处理大数据集。

如何提高python中大数据程序的速度

问题描述

2 个解决方案

解决方案1
2 已采纳 2018-10-30 12:26:07

解决方案2
0 2018-10-30 12:56:52

如何提高python中大数据程序的速度

问题描述

2 个解决方案

解决方案1 2 已采纳 2018-10-30 12:26:07

解决方案2 0 2018-10-30 12:56:52

解决方案1
2 已采纳 2018-10-30 12:26:07

解决方案2
0 2018-10-30 12:56:52