简体   繁体   English

如何在二维阵列上训练 model?

[英]how to train model on a 2d array?

i am trying to figure out if i am doing right so far i wrote this function that gets data with the help of binance API and returns a pandas data frame, then the dataframe is passed to the other function for normalization i am trying to figure out if i am doing right so far i wrote this function that gets data with the help of binance API and returns a pandas data frame, then the dataframe is passed to the other function for normalization

prep_data output: prep_data output:

 [[-1.72859016  1.15233296  1.27019847  1.30035581  1.28289379  0.02391226]
  [-1.72166195  1.30396819  1.41919755  1.47432163  1.52939992 -0.63048155]
  [-1.71473373  1.44612622  1.39126022  1.28102628  1.22600776 -0.29834563]
  ...
  [ 1.71473373 -0.64833537 -0.52710285 -0.68092153 -0.77448436 -0.61306561]
  [ 1.72166195 -0.82840221 -0.90891297 -0.84522258 -0.98306648 -0.30196802]
  [ 1.72859016 -1.03690065 -0.99272495 -1.20281898 -1.18216759  0.02003465]]

this is the array that i would like to pass to the model in other words i want to pass each row to the model to train it, how do i spilt this array into training and testing segments这是我想传递给 model 的数组,换句话说,我想将每一行传递给 model 来训练它,我如何将这个数组溢出到训练和测试段中

 def prep_data(dff):
     num_arr  = dff.to_numpy()
     print (num_arr)
     normalized_num = (num_arr -np.nanmean(num_arr, axis=0))/np.nanstd(num_arr , axis=0)
     print (len(normalized_num))
     print (normalized_num)
 
 
 
     
 
 
 
 def fore_cast():
     candles = client.get_klines(symbol="BNBBTC",  interval=client.KLINE_INTERVAL_30MINUTE)
     date_time = []
     open_lst = []
     high_lst = []
     low_lst = []
     close_lst = [] 
     volume_lst = []
     for item in candles:

         t_time = float(item[0])/1000
         date_time.append(t_time)
         open_lst.append(float(item[1]))
         high_lst.append(float(item[2]))
         low_lst.append(float(item[3]))
         close_lst.append(float(item[4]))
         volume_lst.append(float(item[5]))
     ## creating data frame 
     coin_data_frame = {
         'date_time' : date_time,
         'open'  : open_lst,
         'high'  : high_lst,
         'low'   : low_lst,
         'close' : close_lst,
         'volume': volume_lst,
     }
     df = pd.DataFrame(coin_data_frame , columns = [ 'date_time' , 'open' , 'high' , 'low' , 'close','volume' ])
     print (df.tail)
 
     return df
 
 

if the train-test split is arbitrary you could use pandas train_test_split .如果训练测试拆分是任意的,您可以使用pandas train_test_split

x_train, x_test, y_train, y_test = pd.train_test_split(df)

do notice.请注意。 if you have two classes and you want the train set to hold exactly 80% of each class - the split is not arbitrary.如果您有两个班级,并且您希望火车组恰好容纳每个 class 的 80% - 拆分不是任意的。

then, you just train the model using the x_train, y_train returned, and predict on the x_test part:然后,您只需使用返回的 x_train、y_train 训练 model,并在 x_test 部分进行预测:

clf = RandomForestClassifier()
clf.fit(x_train, y_train)
prediction = clf.predict(x_test)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM