简体   繁体   中英

How to add new csv file data into training LSTM model to predict next future value using python

Here I have a data csv file with four inputs. I want to predict next value using LSTM model. first of all I train the LSTM model with data. Here is my code:

data5 = pd.read_csv('data27.csv',"," )
data6 = pd.read_csv('data33.csv',"," )
data7 = pd.read_csv('data40.csv',",") # here I connect three csv file which is having same column 
data5 = pd.DataFrame(data5, columns= ['date','x1','x2','x3','x4'])
data6 = data5.copy()
data7 = data5.copy()
data8 = data5.append([data6, data7])

data8.set_index('date', inplace=True)

data8 = data8.values

sc = MinMaxScaler(feature_range=(0, 1))
train_data = sc.fit_transform(data8)

x_train = []
y_train = []
for i in range(60,len(train_data)):
   x_train.append(train_data[i-60:i,0])
   y_train.append(train_data[i,0])
x_train, y_train = np.array(x_train), np.array(y_train)
x_train = np.reshape(x_train, (x_train.shape[0],x_train.shape[1],1))


model = Sequential()
model.add(LSTM(units=10, return_sequences=True, input_shape=(x_train.shape[1],1)))
model.add(LSTM(units=10))
model.add(Dense(units=1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(x_train, y_train, epochs=10, batch_size=32)

After training the model I tried to get the prediction value in new csv file x1 column with the same inputs values "date, x1,x2,x3,x4" then I wrote the code for that :

dataset_test = pd.read_csv('data56.csv')
dataset_total = pd.concat((data8['x1'], dataset_test['x1']),axis=0)
inputs =dataset_total[len(dataset_total) - len(dataset_test) - 60:].values
inputs = inputs.reshape(-1,1)
inputs = sc.transform(inputs)

Then I got an error :

 ValueError Traceback (most recent call last) <ipython-input-62-0bcaba4a7ad4> in <module>() ----> 1 inputs = sc.transform(inputs) ~\\Anaconda3\\lib\\site-packages\\sklearn\\preprocessing\\data.py in transform(self, X) 367 X = check_array(X, copy=self.copy, dtype=FLOAT_DTYPES) 368 --> 369 X *= self.scale_ 370 X += self.min_ 371 return X ValueError: non-broadcastable output operand with shape (1153,1) doesn't match the broadcast shape (1153,4)

my csv file for train model :

My csv files for training

After training model my next csv file for test :

new csv file for test

Got another error while I am doing scaler inverse transform : Here is my code:

X_test = []
for i in range(3,inputs.shape[0]):
   X_test.append(inputs[i-3:i,0])
   X_test = np.array(X_test)

   X_test = np.reshape(X_test, (X_test.shape[0],X_test.shape[1],1))

  new output = model.predict(X_test)
  new output =  sc.inverse_transform( new output)

Error :

 ValueError Traceback (most recent call last) <ipython-input-45-489f3f23c5d3> in <module>() ----> 1 glucose = sc.inverse_transform(glucose) ~\\Anaconda3\\lib\\site-packages\\sklearn\\preprocessing\\data.py in inverse_transform(self, X) 383 X = check_array(X, copy=self.copy, dtype=FLOAT_DTYPES) 384 --> 385 X -= self.min_ 386 X /= self.scale_ 387 return X ValueError: non-broadcastable output operand with shape (43,1) doesn't match the broadcast shape (43,4)

Can anyone help me to solve this error?

I change my code and then I got this error: code:

X_test = []
     for i in range(60,80):
       X_test.append(inputs[i-60:i,0])

 X_test = np.array(X_test)

 X_test = np.reshape(X_test, (X_test.shape[0],X_test.shape[1],1))

 new_output = model.predict(X_test)
 new_output =  sc.inverse_transform( new_output)

error:

 ValueError Traceback (most recent call last) <ipython-input-174-8e8d9c47ce3d> in <module>() 17 18 new_output = model.predict(X_test) ---> 19 new_output = sc.inverse_transform( new_output) ~\\Anaconda3\\lib\\site-packages\\sklearn\\preprocessing\\data.py in inverse_transform(self, X) 383 X = check_array(X, copy=self.copy, dtype=FLOAT_DTYPES) 384 --> 385 X -= self.min_ 386 X /= self.scale_ 387 return X ValueError: non-broadcastable output operand with shape (20,1) doesn't match the broadcast shape (20,8)

Why do you reshape your inputs to have a final dimension of 1 in the snippet?

dataset_test = pd.read_csv('data56.csv')
dataset_total = pd.concat((data8['x1'], dataset_test['x1']),axis=0)
inputs =dataset_total[len(dataset_total) - len(dataset_test) - 60:].values
inputs = inputs.reshape(-1,1)
inputs = sc.transform(inputs)

Your scaler expects this data to have a shape of 4 in the last dimension.

So when you call data8['x1'] you're only taking one column. When you've created and trained a model with 4 inputs, you cannot change this. I suspect that you should either remove the ['x1'] section from this code, or fix 'data56.csv' so that it has five columns (date, x1, x2, x3, x4).

Edit

So, what I would change your code to is

dataset_test = pd.read_csv('data56.csv')
dataset_total = pd.concat((data8[['x1','x2','x3','x4']], 
                           dataset_test[['x1','x2','x3','x4']]),axis=0)
inputs =dataset_total[len(dataset_total) - len(dataset_test) - 60:].values
inputs = inputs.reshape(-1,4)
inputs = sc.transform(inputs)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM