[英]Having trouble fitting data to HMM-Learn model (Python3.9)
我正在嘗試對來自 S&P500 的一些股票數據進行隱馬爾可夫模型建模。
這些數據是從雅虎財經下載的,並包含在一個包含 250 個交易日數據的 CSV 文件中。 一周前我讓這段代碼工作,但現在它似乎不起作用。
import pandas as pd
from hmmlearn import hmm
import numpy as np
from matplotlib import cm, pyplot as plt
from matplotlib.dates import YearLocator, MonthLocator
df = pd.read_csv( "SnP500_1Yhist.csv",
header = 0,
index_col = "Date",
parse_dates = True
)
df["Returns"] = df["Adj Close"].pct_change()
df.dropna( inplace = True )
hmm_model = hmm.GaussianHMM( n_components = 4,
covariance_type = "full",
n_iter = 100
) # %Create the model
df = df["Returns"] # %Extract the wanted column of data
training_set = np.column_stack( df ) # %Shape = [1,250]
hmm_model.fit( training_set ) # %This is where I get the error
我得到的錯誤是:
ValueError Traceback (most recent call last)
<ipython-input-51-c8f66806fad6> in <module>
9 print(training_set.shape)
10 print(training_set)
---> 11 hmm_model.fit(training_set)
~/Git Projects/Aiguille Systems/allocationmodel/macromodelv2_venv/lib/python3.9/site-packages/hmmlearn/base.py in fit(self, X, lengths)
460 """
461 X = check_array(X)
--> 462 self._init(X, lengths=lengths)
463 self._check()
464
~/Git Projects/Aiguille Systems/allocationmodel/macromodelv2_venv/lib/python3.9/site-packages/hmmlearn/hmm.py in _init(self, X, lengths)
205 kmeans = cluster.KMeans(n_clusters=self.n_components,
206 random_state=self.random_state)
--> 207 kmeans.fit(X)
208 self.means_ = kmeans.cluster_centers_
209 if self._needs_init("c", "covars_"):
~/Git Projects/Aiguille Systems/allocationmodel/macromodelv2_venv/lib/python3.9/site-packages/sklearn/cluster/_kmeans.py in fit(self, X, y, sample_weight)
1033 accept_large_sparse=False)
1034
-> 1035 self._check_params(X)
1036 random_state = check_random_state(self.random_state)
1037
~/Git Projects/Aiguille Systems/allocationmodel/macromodelv2_venv/lib/python3.9/site-packages/sklearn/cluster/_kmeans.py in _check_params(self, X)
956 # n_clusters
957 if X.shape[0] < self.n_clusters:
--> 958 raise ValueError(f"n_samples={X.shape[0]} should be >= "
959 f"n_clusters={self.n_clusters}.")
960
ValueError: n_samples=1 should be >= n_clusters=4.
問: “……它似乎不起作用。”
好,
確實如此。 如果您在調用.fit()
方法之前測試您的實際training_set
.fit()
,我們無法在此處重現,您將得到報告錯誤的直接原因:
N_COMPONENTS = 4
ERR_MASK = ( "ERR: training_set was smaller than the N_COMPONENTS == {0:}"
+ "requested,\n"
+ " whereas the actual shape[0] was {1:}"
)
...
hmm_model = hmm.GaussianHMM( n_components = N_COMPONENTS,
covariance_type = "full",
n_iter = 100
)
...
( hmm_model.fit( training_set ) if training_set.shape[0] >= N_COMPONENTS
else print( ERR_MASK.format( N_COMPONENTS,
training_set.shape[0]
)
)
)
~/Git Projects/Aiguille Systems/allocationmodel/macromodelv2_venv/lib/python3.9/site-packages/sklearn/cluster/_kmeans.py in _check_params(self, X)
956 # n_clusters
957 if X.shape[0] < self.n_clusters:
--> 958 raise ValueError(f"n_samples={X.shape[0]} should be >= "
959 f"n_clusters={self.n_clusters}.")
--------------------------------------------------X.shape[0]------------
--------------------------------------------------X.shape[0]------------
ValueError: n_samples=1 should be >= n_clusters=4.
fit( X, lengths = None )
Estimate model parameters.
An initialization step is performed before entering the EM algorithm.
If you want to avoid this step for a subset of the parameters,
pass proper init_params keyword argument to estimator’s constructor.
Parameters
X ( array-like, shape ( n_samples, n_features ) )
– Feature matrix of individual samples.
lengths ( array-like of integers, shape ( n_sequences, ) )
– Lengths of the individual sequences in X.
The sum of these should be n_samples.
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.