简体   繁体   English

Python 中的约束逻辑回归

[英]Constrained Logistic Regression in Python

In order to predict future stock movements, we use a logistic regression in Python.为了预测未来的股票走势,我们在 Python 中使用逻辑回归。 We do so by converting the daily return into a weekly return.我们通过将每日回报转换为每周回报来做到这一点。 Next, we determine if the return is going up or down, by using this code:接下来,我们使用以下代码确定回报是上升还是下降:

# calculate daily log returns and market direction
stock['returns'] = np.log(stock / stock.shift(5))
stock.dropna(inplace=True)
stock['direction'] = np.sign(stock['returns']).astype(int)

We have three direction signals:我们有三个方向信号:

 0  = hold the stock / do noting
 1  = buy the stock
-1 = sell the stock

Below is the code to determine the long direction, if the stock is still increasing then hold the stock, otherwise sell the stock.下面是确定多头方向的代码,如果股票仍在上涨,则持有股票,否则卖出股票。

stock['long direction'] = 0
for val, group in itertools.groupby(enumerate(stock['direction']), itemgetter(1)):
    # this is tuple unpacking, irrelevant is a list of values that aren't the last one, and last is the one we care about.
    [*irrelevent, last] = group
    stock['long direction'].iloc[last[0]] = -last[1]

del stock['direction'] 

The output is as follows:

    Date        close       returns     long direction
    2021-12-08  1068.95     -0.024068   0
    2021-12-09  1003.79     -0.077418   1
    2021-12-10  1017.03      0.002028   -1
    2021-12-13  966.40      -0.043137   0
    2021-12-14  958.51      -0.092831   0
    2021-12-15  975.98      -0.090989   0
    2021-12-16  926.91      -0.079681   0
    2021-12-17  932.57      -0.086698   1

We have used a logistic regression to predict the future movements, but we don't know how to add a constraint, which prevents short selling.我们使用逻辑回归来预测未来走势,但我们不知道如何添加约束,以防止卖空。 Hence, we don't want a -1, -1 direction in a row, and we don't want a 1, 1 direction in a row.因此,我们不希望连续出现 -1, -1 方向,也不希望连续出现 1, 1 方向。 A 0,0,0,0, signal is fine, which means that we have to hold the stock. 0,0,0,0, 信号很好,这意味着我们必须持有股票。 We have a large imbalance in our dataset, the signal 0 is predominant.我们的数据集中存在很大的不平衡,信号 0 占主导地位。 We have dealt with this issue by using class_weight .我们已经通过使用class_weight处理了这个问题。

Below is the current code:以下是当前代码:

stock = stock.dropna()
X = stock.loc[:, stock.columns != 'long direction']
y = stock['long direction']

X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size = 0.05, random_state = 5, shuffle=False)

model1 = LogisticRegression(random_state=0, multi_class='multinomial', penalty='none', solver='newton-cg', class_weight={-1:0.35, 0:0.3, 1:0.75}).fit(X_train, y_train)
preds = model1.predict(X_test)

#print the tunable parameters (They were not tuned in this example, everything kept as default)
params = model1.get_params()
print(params)

Below is the output of the code, where the signal -1,-1, is the predictor, but we don't want that the model predicts -1,-1 signals consecutively.下面是代码的 output,其中信号 -1,-1 是预测器,但我们不希望 model 连续预测 -1,-1 信号。

           long direction   pred
Date        
2021-10-11  0               -1
2021-10-12  0               -1
2021-10-13  0               -1
2021-10-14  0               -1
2021-10-15  0               -1

How can we add the constraint, that the model knowns that -1,-1 and a 1,1 signals in a row are not possible?我们如何添加约束,即 model 知道 -1,-1 和 1,1 连续信号是不可能的?

There are a few approaches to do this.有几种方法可以做到这一点。

The first approach is to modify the recommendation after it is given by the logistic regression, so if it returns -1 and then -1 in a row you have some other mechanism in place which decides what to "change" that second -1 into (this could be anything: a simple decision rule you create yourself, going back in the projections to use previous long values, etc.).第一种方法是在逻辑回归给出推荐之后修改推荐,所以如果它返回-1,然后连续返回-1,那么你有一些其他机制来决定将第二个-1“更改”为(这可以是任何东西:您自己创建的简单决策规则,返回预测以使用以前的长值等)。 This approach feels like more of bodge than an elegant solution.这种方法感觉更像是一个笨拙的解决方案,而不是一个优雅的解决方案。 If you prefer a solution that is more elegant, continue reading for the second approach.如果您更喜欢更优雅的解决方案,请继续阅读第二种方法。

The second approach is to change your model so that it includes that fact.第二种方法是更改您的 model 以使其包含该事实。 The foundational idea of Machine Learning algorithms like logistic regression is to be able to do the prediction for you automatically without you having to create and define explicit rules.像逻辑回归这样的机器学习算法的基本思想是能够自动为您进行预测,而无需您创建和定义明确的规则。 In congruence with this, you can modify your training data such that it takes into account the previous signal as part of its input data, and then the training output data will also use this fact.与此一致,您可以修改您的训练数据,使其将先前的信号作为其输入数据的一部分考虑在内,然后训练 output 数据也将使用此事实。 If your training data always has the previous indicator as one of its inputs, and whenever that previous indicator is -1 the output is never -1 (because you do not want two -1 in a row), then the model will learn not to give you a -1 on new inference data with the previous indicator was -1.如果您的训练数据始终将前一个指标作为其输入之一,并且只要前一个指标为 -1,output 就永远不会是 -1(因为您不希望连续两个 -1),那么 model 将学会不给你一个 -1 的新推理数据,之前的指标是 -1。

A third approach that comes to mind which is perhaps in between the first and second approach is to have three different trained models: one model trained whenever the previous direction was -1, one model when it was 0, and one model when it was 1. The first and third models do not even have the option -1 and 1 respectively in their label output space so they can not go wrong in that way, and then second model would have all three of -1, 0, and 1.第三种可能介于第一种和第二种方法之间的方法是使用三种不同的训练模型:一个 model 在前一个方向为 -1 时训练,一个 model 当它为 0 时训练,一个 Z20F35E630DAF44DBFA4C84ZC3F6 当它是. The first and third models do not even have the option -1 and 1 respectively in their label output space so they can not go wrong in that way, and then second model would have all three of -1, 0, and 1.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM