I had shifted from using single gpu to multiple gpu. The Code throws an error
epoch main/loss validation/main/loss elapsed_time
Exception in main training loop: '<' not supported between instances of
'list' and 'int'
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/chainer_p36/lib/python3.6/site-
packages/chainer/training/trainer.py", line 318, in run
entry.extension(self)
File "/home/ubuntu/anaconda3/envs/chainer_p36/lib/python3.6/site-
packages/chainer/training/extensions/evaluator.py", line 157, in
__call__
result = self.evaluate()
File "/home/ubuntu/anaconda3/envs/chainer_p36/lib/python3.6/site-
packages/chainer/training/extensions/evaluator.py", line 206, in evaluate
in_arrays = self.converter(batch, self.device)
File "/home/ubuntu/anaconda3/envs/chainer_p36/lib/python3.6/site-
packages/chainer/dataset/convert.py", line 150, in concat_examples
return to_device(device, _concat_arrays(batch, padding))
File "/home/ubuntu/anaconda3/envs/chainer_p36/lib/python3.6/site-
packages/chainer/dataset/convert.py", line 35, in to_device
elif device < 0:
Will finalize trainer extensions and updater before reraising the exception.
I have tried without using gpu it worked fine. But when using single gpu ,got an error of out of memory.so, shifted p28xlarge instance and now it throws the above error.where is the problem and how to solve it ?
num_gpus = 8
chainer.cuda.get_device_from_id(0).use()
3.# updater
if num_gpus > 0:
updater = training.updater.ParallelUpdater(
train_iter,
optimizer,
devices={('main' if device == 0 else str(device)): device for
device in range(num_gpus)},
)
else:
updater = training.updater.StandardUpdater(train_iter, optimizer,
device=args.gpus)
4.and son on.. 5.Training :
trainer.run()
output -- epoch main/loss validation/main/loss elapsed_time Exception in main training loop: '<' not supported between instances of 'list' and 'int'
I expected the output as
epoch main/loss validation/main/loss elapsed_time
1.
2.
3. and so on till it converge's.
It seems like an error caused by the Evaluator
extension when it's transferring data to the specified device
. How are you specifying the device
to Evalutor.__init__
? Note that it should be a single device. Maybe this example could be a reference https://github.com/chainer/chainer/blob/master/examples/mnist/train_mnist_data_parallel.py
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.