简体   繁体   English

sklearn逻辑回归的功能

[英]Features in sklearn logistic regression

I have some problem with adding own features to sklearn.linear_model.LogisticRegression. 我在向sklearn.linear_model.LogisticRegression添加自己的功能时遇到了一些问题。 But anyway lets see some example code: 但是无论如何,让我们看一些示例代码:

from sklearn.linear_model import LogisticRegression, LinearRegression
import numpy as np

#Numbers are class of tag
resultsNER = np.array([1,2,3,4,5])

#Acording to resultNER every row is another class so is another features
#but in this way every row have the same features
xNER = np.array([[1.,0.,0.,0.,-1.,1.],
                 [1.,0.,1.,0.,0.,1.],
                 [1.,1.,1.,1.,1.,1.],
                 [0.,0.,0.,0.,0.,0.],
                 [1.,1.,1.,0.,0.,0.]])

#Assing resultsNER to y
y = resultsNER
#Create LogReg
logit = LogisticRegression(C=1.0)
#Learn LogReg
logit.fit(xNER,y)

#Some test vector to check wich class will be predict
xPP = np.array([1.,1.,1.,0.,0.,1.])

#linear = LinearRegression()
#linear.fit(x, y)

print "expected: ", y
print "predicted:", logit.predict(xPP)
print "decision: ",logit.decision_function(xNER)
print logit.coef_
#print linear.predict(x)
print "params: ",logit.get_params(deep=True)

Code above is clear and easy. 上面的代码清晰易懂。 So I have some classes which I called 1,2,3,4,5(resultsNER) they are related to some classes like "data", "person", "organization" etc. So for each class I make custom features which return true or false, in this case one and zero numbers. 所以我有一些我称为1,2,3,4,5(resultsNER)的类,它们与某些类有关,例如“数据”,“人”,“组织”等。因此,对于每个类,我都会进行自定义功能,并返回对或错,在这种情况下为一和零。 Example: if token equals "(S|s)unday", it is data class. 示例:如果令牌等于“(S | s)unday”,则为数据类。 Mathematically it is clear. 从数学上讲很清楚。 I have token for each class features I test it. 对于测试的每个类功能,我都有标记。 Then I look which class have the max value of sum of features (that's why return number not boolean) and pick it up. 然后,我查看哪个类具有要素总和的最大值(这就是为什么返回数字不是布尔值)并选择它。 In other words I use argmax function. 换句话说,我使用argmax函数。 Of course in summarization each feature have alpha coefficients. 当然,概括起来,每个特征都具有alpha系数。 In this case it is multiclass classification, so I need to know how to add multiclass features to sklearn.LogisticRegression. 在这种情况下,它是多类分类,因此我需要知道如何向sklearn.LogisticRegression添加多类功能。

I need two things, alphas coefficients and add my own features to Logistic Regression. 我需要两件事,阿尔法系数,并将自己的特征添加到Logistic回归中。 The most important for me is how to add to sklearn.LogisticRegression my own features functions for each class. 对我来说最重要的是如何为每个类添加sklearn.LogisticRegression我自己的功能。

I know I can compute coefficients by gradient descent. 我知道我可以通过梯度下降来计算系数。 But I think when I use fit(x,y) the LogisticRegression use some algorithm to compute coefficients witch I can get by attribute .coef_ . 但是我认为当我使用fit(x,y)时,LogisticRegression使用某种算法来计算可以通过属性.coef_获得的系数。

So in the end my main question is how to add custom features for different classes in my example classes 1,2,3,4,5 (resultNER). 所以最后,我的主要问题是如何在示例类1,2,3,4,5(resultNER)中为不同的类添加自定义功能。

Not quite sure about your question, but few thing that might help you: 不太确定您的问题,但有几件事可能对您有所帮助:

  • You can use predict_proba function to estimate probabilities for each class: 您可以使用predict_proba函数来估计每个类的概率:

     >>> logit.predict_proba(xPP) array([[ 0.1756304 , 0.22633999, 0.25149571, 0.10134168, 0.24519222]]) 
  • If you want features to have some weights (is this the thing you're calling alpha?), you do it not in learning algorithm but on preprocessing phase . 如果您希望要素具有一定的权重(这就是您所称的alpha吗?),则不是在学习算法中进行,而是在预处理阶段进行 I your case you can use an array of coefficients: 我的情况下,您可以使用系数数组:

     >>> logit = LogisticRegression(C=1.0).fit(xNER,y) >>> logit.predict(xPP) array([3]) >>> alpha = np.array([[0.2, 0.2, 1, 1, 0.3, 1]]) >>> logit = LogisticRegression(C=1.0).fit(alpha*xNER,y) >>> logit.predict(alpha*xPP) array([2]) 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM