I am trying to fit an LSTM model in Pytorch. My data is too big to be read into memory and so I want to create mini-batches of data using the DataLoader
function from Pytorch.
I have two features as input ( X1
, X2
). I have one output feature (y). I am using 365 timesteps of X1
& X2
as features used to predict y
.
The dimensions of my training array is:
(n_observations, n_timesteps, n_features)
== (9498, 365, 2)
I don't understand why the code below isn't working because I have seen other examples where the X, y pairs have different numbers of dimensions ( LSTM for runoff modelling , Pytorch's own docs )
import numpy as np
import torch
from torch.utils.data import DataLoader
train_x = torch.Tensor(np.random.random((9498, 365, 2)))
train_y = torch.Tensor(np.random.random((9498, 1)))
val_x = torch.Tensor(np.random.random((1097, 365, 2)))
val_y = torch.Tensor(np.random.random((1097, 1)))
test_x = torch.Tensor(np.random.random((639, 365, 2)))
test_y = torch.Tensor(np.random.random((639, 1)))
train_dataset = (train_x, train_y)
test_dataset = (test_x, test_y)
val_dataset = (val_x, val_y)
train_dataloader = DataLoader(train_dataset, batch_size=256)
iterator = train_dataloader.__iter__()
iterator.next()
Output:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-47-2a0b28b53c8f> in <module>
13
14 iterator = train_dataloader.__iter__()
---> 15 iterator.next()
/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py in __next__(self)
344 def __next__(self):
345 index = self._next_index() # may raise StopIteration
--> 346 data = self._dataset_fetcher.fetch(index) # may raise StopIteration
347 if self._pin_memory:
348 data = _utils.pin_memory.pin_memory(data)
/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py in fetch(self, possibly_batched_index)
45 else:
46 data = self.dataset[possibly_batched_index]
---> 47 return self.collate_fn(data)
/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py in default_collate(batch)
53 storage = elem.storage()._new_shared(numel)
54 out = elem.new(storage)
---> 55 return torch.stack(batch, 0, out=out)
56 elif elem_type.__module__ == 'numpy' and elem_type.__name__ != 'str_' \
57 and elem_type.__name__ != 'string_':
RuntimeError: invalid argument 0: Tensors must have same number of dimensions: got 4 and 3 at /tmp/pip-req-build-4baxydiv/aten/src/TH/generic/THTensor.cpp:680
The torch.utils.data.DataLoader
must get a torch.utils.data.Dataset
as parameters. You're giving a tuple of tensors. I suggest you use the torch.utils.data.TensorDataset
as follows:
from torch.utils.data import DataLoader, TensorDataset
train_x = torch.rand(9498, 365, 2)
train_y = torch.rand(9498, 1)
train_dataset = TensorDataset(train_x, train_y)
train_dataloader = DataLoader(train_dataset, batch_size=256)
for x, y in train_dataloader:
print (x.shape)
Check if it solves your problem.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.