简体   繁体   English

即使我尝试使用训练数据进行预测,sklearn Logistic Regression 的准确性也太低

[英]sklearn Logistic Regression has too little accuracy even if I try to predict with the train data

I am currently trying to use Logistic Regression on some vectors and I use the sklearn library.我目前正在尝试对某些向量使用逻辑回归,并且使用 sklearn 库。

Here is my code.这是我的代码。 I first the files that contain the data and the assign the values to arrays.我首先是包含数据的文件并将值分配给 arrays。

# load files
xvectors_train = kaldiio.load_scp('train/xvector.scp')

# create empty arrays where to store the data
x_train = np.empty(shape=(len(xvectors_train.keys()), len(xvectors_train[list(xvectors_train.keys())[0]])))
y_train = np.empty(len(xvectors_train.keys()), dtype=object)

# assign values to the empty arrays
for file_id in xvectors_train:
  x_train[i] = xvectors_train[file_id]
  label = file_id.split('_')
  y_train[i] = label[0]
  i+=1

# create a model and train it
model = LogisticRegression( max_iter = 200, solver = 'liblinear')
model.fit(x_train, y_train) 

# predict 
model.predict(x_train)

#score
score = model.score(x_train, y_train)

For some reason even if I use the x_train data for my predictions the score is about 0.32.出于某种原因,即使我使用 x_train 数据进行预测,分数也约为 0.32。 Shouldn't it be 1.0, because the model already knows the answers for those?不应该是 1.0,因为 model 已经知道这些的答案了吗? If I use my test data the score is still like 0.32.如果我使用我的测试数据,分数仍然是 0.32。

Does anyone know what the problem could be?有谁知道是什么问题?

There isn't any obvious problem, and the result looks normal: your test score is very similar to your training score.没有任何明显的问题,结果看起来很正常:你的测试分数和你的训练分数非常相似。

Most models try to learn the rules/params that generalize to new data, but NOT memorizing your existing training data, which means "Shouldn't it be 1.0, because the model already knows the answers for those?"大多数模型试图学习推广到新数据的规则/参数,但不记住现有的训练数据,这意味着“不应该是 1.0,因为 model 已经知道这些的答案了吗?” is not true...不是真的……

If you are actually seeing that your test set score is significantly lower than your training score (eg, 0.32 vs 1.0), then it means your model is badly overfitting and needs to be fixed.如果您实际上看到您的测试集分数明显低于您的训练分数(例如,0.32 与 1.0),那么这意味着您的 model 严重过度拟合,需要修复。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM