简体   繁体   English

python中的逻辑回归

[英]Logistic Regression in python

I am currently doing the Logistic Regression in machine learning for python. 我目前正在针对Python进行机器学习中的Logistic回归。 This is the code i write. 这是我写的代码。

import pandas as pd
from sklearn import linear_model
import numpy as np
from sklearn.utils import column_or_1d

logistic = linear_model.LogisticRegression()

data = pd.read_excel('/home/mick/PycharmProjects/project1/excel/Ron95_Price_Class.xlsx')

X = data[['Date']]
y = data[['Ron95_RM']]

y = np.ravel(y)

logistic.fit(X, y)

price = logistic.predict(42491)
print "The price for Ron95 in next month will be RM", np.array_str(price,1)

This is the output of the code 这是代码的输出

The price for Ron95 in next month will be RM [ u'B']

There is no error, but my question is the characters after RM in the output should be 'B' or an other characters. 没有错误,但是我的问题是输出中RM后面的字符应该是'B'或其他字符。 I wonder if it's because I do the code wrongly or is just a format problem with the numpy array. 我想知道是因为我做错了代码还是仅仅是numpy数组的格式问题。

Because I basically just started with Python today, sorry if I just made a stupid mistake. 因为我今天基本上才刚开始使用Python,所以如果我犯了一个愚蠢的错误,对不起。

I think it will be more easily, when you post some data from Ron95_Price_Class.xlsx 我认为,当您从Ron95_Price_Class.xlsx发布一些数据时,它将更容易
Right now I see, that you are not delete target variable (y), from train data. 现在,我知道您不是从火车数据中删除目标变量(y)。 You can do it by 你可以做到

X = data['Date']             #you can use only one bracket if choose only
y = data['Ron95_RM']         #column
X = data.drop('Ron95_RM')

If I am not mistaken the 'u' is just notation that the string is a unicode string. 如果我没记错的话,“ u”只是表示该字符串是unicode字符串。 I am not sure how you are running your code, but when i test in an ipython notebook or in a windows command prompt I get the following output: 我不确定您如何运行代码,但是当我在ipython笔记本或Windows命令提示符下进行测试时,得到以下输出:

The price for Ron95 in next month will be RM [ 'B']

This is perhaps because I ran this in python 3.5 whereas it appears you are still using python < 3.0. 这可能是因为我在python 3.5中运行了它,而您似乎仍在使用python <3.0。

It's not that your answer is wrong, you are just getting info about the format of the data. 并不是您的答案是错误的,您只是在获取有关数据格式的信息。 For other questions on this subject see here and here . 有关此主题的其他问题,请参见此处此处 The python how-to on unicode may also be helpful. unicode中的python how-to可能也有帮助。

The Predict method as mentioned in the scikit-learn documentation, http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression.predict , mentions the return of the predict method is array, shape = [n_samples]. scikit-learn文档( http://scikit-learn.org/stable/modules/generation/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression.predict中提到的Predict方法提到了预测的返回方法是数组,形状= [n_samples]。 So for you the shape is 1x1 array. 因此,形状为1x1阵列。 To get the desired output you ca try "price[0]". 要获得所需的输出,您可以尝试“ price [0]”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM