[英]Why my Logistic Regression Score is always 1.0?
I am using sklearn in python and the idea of the implementation is to predict SPX 500 by using the Logistic Regression.我在 python 中使用 sklearn,实现的想法是通过使用逻辑回归来预测 SPX 500。
I got the SPX historical prices from yfinance and calculated 5 features (x) based on daily returns (which I calculated as well).我从 yfinance 获得了 SPX 历史价格,并根据每日回报(我也计算过)计算了 5 个特征 (x)。
The dependent variable (y) is 1 for positive retunrs and 0 for negative returns.因变量 (y) 对于正回报为 1,对于负回报为 0。
When I predicted the model and check the model.score(), the value is always 1. But Why?当我预测模型并检查 model.score() 时,该值始终为 1。但为什么呢?
the code is:代码是:
# Import the data
df= yf.download('^GSPC', start="2018-1-1", end="2020-10-20")
df = df.dropna()
df['Return'] = np.log(df['Adj Close']/df['Adj Close'].shift(1))
# Create Indicators
df['Ret_1'] = df['Return'].shift(1)
df['Ret_2'] = df['Return'].shift(2)
df['Adj Close-Adj Close 1day'] = df['Adj Close'] - df['Adj Close'].shift(1)
df['Adj Close-Adj Close 5days'] = df['Adj Close'] - df['Adj Close'].shift(5)
df['S_10'] = df['Adj Close'].rolling(window=5).mean()
df = df.dropna()
X = df.iloc[:,-5:]
y =np.where(df['Return'] > 0 ,1,0)
# Split the Dataset and Instantiate Logistic Regression
split = int(0.7*len(df))
X_train, X_test, y_train, y_test = X[:split], X[split:], y[:split], y[split:]
model = LogisticRegression()
model = model.fit (X_train,y_train)
predicted = model.predict(X_test)
print(model.score(X_test,y_test))
Among the 5 features that you are inputing in your logistic regression, the variable在逻辑回归中输入的 5 个特征中,变量
df['Adj Close-Adj Close 1day'] = df['Adj Close'] - df['Adj Close'].shift(1)
has the same sign as the underlying variable df['Return']
that you are using in the target与您在目标中使用的基础变量
df['Return']
具有相同的符号
y =np.where(df['Return'] > 0 ,1,0)
hence the Logistic regression is going to match very well.因此逻辑回归将非常匹配。
Moreover both use df['Adj Close']
, so you are trying to predict a target value using something that in fact enables you to precisely calculate the target value.此外,两者都使用
df['Adj Close']
,因此您正在尝试使用实际上使您能够精确计算目标值的东西来预测目标值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.