[英]how to train model on a 2d array?
i am trying to figure out if i am doing right so far i wrote this function that gets data with the help of binance API and returns a pandas data frame, then the dataframe is passed to the other function for normalization i am trying to figure out if i am doing right so far i wrote this function that gets data with the help of binance API and returns a pandas data frame, then the dataframe is passed to the other function for normalization
prep_data output: prep_data output:
[[-1.72859016 1.15233296 1.27019847 1.30035581 1.28289379 0.02391226]
[-1.72166195 1.30396819 1.41919755 1.47432163 1.52939992 -0.63048155]
[-1.71473373 1.44612622 1.39126022 1.28102628 1.22600776 -0.29834563]
...
[ 1.71473373 -0.64833537 -0.52710285 -0.68092153 -0.77448436 -0.61306561]
[ 1.72166195 -0.82840221 -0.90891297 -0.84522258 -0.98306648 -0.30196802]
[ 1.72859016 -1.03690065 -0.99272495 -1.20281898 -1.18216759 0.02003465]]
this is the array that i would like to pass to the model in other words i want to pass each row to the model to train it, how do i spilt this array into training and testing segments这是我想传递给 model 的数组,换句话说,我想将每一行传递给 model 来训练它,我如何将这个数组溢出到训练和测试段中
def prep_data(dff):
num_arr = dff.to_numpy()
print (num_arr)
normalized_num = (num_arr -np.nanmean(num_arr, axis=0))/np.nanstd(num_arr , axis=0)
print (len(normalized_num))
print (normalized_num)
def fore_cast():
candles = client.get_klines(symbol="BNBBTC", interval=client.KLINE_INTERVAL_30MINUTE)
date_time = []
open_lst = []
high_lst = []
low_lst = []
close_lst = []
volume_lst = []
for item in candles:
t_time = float(item[0])/1000
date_time.append(t_time)
open_lst.append(float(item[1]))
high_lst.append(float(item[2]))
low_lst.append(float(item[3]))
close_lst.append(float(item[4]))
volume_lst.append(float(item[5]))
## creating data frame
coin_data_frame = {
'date_time' : date_time,
'open' : open_lst,
'high' : high_lst,
'low' : low_lst,
'close' : close_lst,
'volume': volume_lst,
}
df = pd.DataFrame(coin_data_frame , columns = [ 'date_time' , 'open' , 'high' , 'low' , 'close','volume' ])
print (df.tail)
return df
if the train-test split is arbitrary you could use pandas train_test_split .如果训练测试拆分是任意的,您可以使用pandas train_test_split 。
x_train, x_test, y_train, y_test = pd.train_test_split(df)
do notice.请注意。 if you have two classes and you want the train set to hold exactly 80% of each class - the split is not arbitrary.如果您有两个班级,并且您希望火车组恰好容纳每个 class 的 80% - 拆分不是任意的。
then, you just train the model using the x_train, y_train returned, and predict on the x_test part:然后,您只需使用返回的 x_train、y_train 训练 model,并在 x_test 部分进行预测:
clf = RandomForestClassifier()
clf.fit(x_train, y_train)
prediction = clf.predict(x_test)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.