简体   繁体   English

我要接受GPU训练吗?

[英]Am I training with gpu?

I'm training a neural model with keras and tensorflow as backend. 我正在训练一个以keras和tensorflow作为后端的神经模型。 The log file starts with the following message: 日志文件以以下消息开头:

nohup: ignoring input
2019-02-12 17:44:29.414526: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-02-12 17:44:30.191565: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:65:00.0
totalMemory: 7.93GiB freeMemory: 7.81GiB
2019-02-12 17:44:30.191601: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2019-02-12 17:44:30.409790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-02-12 17:44:30.409828: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      0 
2019-02-12 17:44:30.409834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0:   N 
2019-02-12 17:44:30.410015: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7535 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:65:00.0, compute capability: 6.1)

Does this mean that the training is performed on gpu ? 这是否意味着对gpu进行训练?

I would say yes, but when I execute nvtop , I see that all the gpu memory is used while 0% of the gpu calculation capacity is used (see yellow screenshot below): 我会说是的,但是当我执行nvtop ,我看到使用了所有的gpu内存,而使用了0%的gpu计算能力(请参见下面的黄色屏幕截图):

看截图

Also, when I type htop in the command line, I see that one CPU is fully used (see black screenshot below) 另外,当我在命令行中键入htop时,我看到一个CPU已被完全使用(请参见下面的黑色屏幕截图)

看截图

How come the gpu memory is used and the cpu capacity calculation is used instead of the gpu capacity calculation ? 为什么使用gpu内存而不使用gpu容量计算来使用cpu容量计算?

I think that you have compiled (or you installed already compiled package) tensorflow with CUDA support, but not with support all instructions available for your CPU (your CPU supports AVX2, AVX512F and FMA instructions that tensorflow can use). 我认为您已经使用CUDA支持编译了(或已安装了已编译的程序包)张量流,但不支持所有适用于您的CPU的指令(您的CPU支持张量流可以使用的AVX2,AVX512F和FMA指令)。

This means, that tensorflow will work fine (with full GPU support), but you can't use your processor at full capacity. 这意味着tensorflow可以正常工作(具有完整的GPU支持),但是您不能满负荷使用处理器。

Try compare time (GPU vs CPU) with this example: https://stackoverflow.com/a/54661896/10418812 尝试使用以下示例比较时间(GPU与CPU): https : //stackoverflow.com/a/54661896/10418812

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 GPU 进行 HuggingFace 训练 - HuggingFace Training using GPU 我坚持将 CSV 数据集(字符串列)编码为训练数据 - I am stuck at encoding CSV dataset (String columns) to Training data 多 GPU 训练比 Tensorflow 上的单 GPU 慢 - Multi GPU training slower than single GPU on Tensorflow Spacy CLI 培训无法激活 GPU - Spacy CLI Training Unable to Activate GPU 使用多个 GPU 训练一个模型 - Training one model with several GPU's Sagemaker 实例在训练期间未使用 GPU - Sagemaker Instance not utilising GPU during training 为什么我在 ubuntu 中出现“导入 tensorflow”的错误,即使安装了在 gpu 中运行 tensorflow 的所有东西 - why i am getting error for "import tensorflow" in ubuntu even after installing everything for running tensorflow in gpu GPU 在训练 ml model 时用完 memory - GPU runs out of memory when training a ml model Tensorflow 在一个 GPU 上支持多个线程/流进行训练吗? - Tensorflow supports multiple threads/streams on one GPU for training? 在 GPU 上使用 Tensorflow 1.15 训练机器学习模型时清除 memory - Clearing memory when training Machine Learning models with Tensorflow 1.15 on GPU
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM