[英]Keras model uses only one GPU all the time
我正在尝试在具有8个GPU的AWS EC2 p3.16xlarge实例上训练CNN模型。 当我使用500的批处理大小时,即使系统有8个GPU,也一直都只使用一个GPU。 当我将批处理大小增加到1000时,它仅使用GPU,与500情况相比,它的确变慢了。 如果我将批处理大小增加到2000,则会发生内存溢出。 如何解决此问题?
我正在使用tensorflow后端。 GPU利用率如下
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104 Driver Version: 410.104 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:00:17.0 Off | 0 |
| N/A 47C P0 69W / 300W | 15646MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... On | 00000000:00:18.0 Off | 0 |
| N/A 44C P0 59W / 300W | 502MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-SXM2... On | 00000000:00:19.0 Off | 0 |
| N/A 45C P0 61W / 300W | 502MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-SXM2... On | 00000000:00:1A.0 Off | 0 |
| N/A 47C P0 64W / 300W | 502MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla V100-SXM2... On | 00000000:00:1B.0 Off | 0 |
| N/A 48C P0 62W / 300W | 502MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla V100-SXM2... On | 00000000:00:1C.0 Off | 0 |
| N/A 46C P0 61W / 300W | 502MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla V100-SXM2... On | 00000000:00:1D.0 Off | 0 |
| N/A 46C P0 65W / 300W | 502MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla V100-SXM2... On | 00000000:00:1E.0 Off | 0 |
| N/A 46C P0 63W / 300W | 502MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 15745 C python3 15635MiB |
| 1 15745 C python3 491MiB |
| 2 15745 C python3 491MiB |
| 3 15745 C python3 491MiB |
| 4 15745 C python3 491MiB |
| 5 15745 C python3 491MiB |
| 6 15745 C python3 491MiB |
| 7 15745 C python3 491MiB |
+-----------------------------------------------------------------------------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.