简体   繁体   中英

How can I tell if H2O 3.11.0.266 is running with GPUs?

I've installed H2O 3.11.0.266 on a Ubuntu 16.04 with CUDA 8.0 and libcudnn.so.5.1.10 so I believe H2O should be able to find my GPUs.

However, when I start up my h2o.init() in Python, I do not see evidence that it is actually using my GPUs. I see:

  • H2O cluster total cores: 8
  • H2O cluster allowed cores: 8

which is the same as I had in the previous version (pre GPU).

Also, http://127.0.0.1:54321/flow/index.html shows only 8 cores as well.

I wonder if I don't have something properly installed or whether the latest h2o.init() hasn't implemented info about what GPUs are available or what...

Many thanks in advance.

[edit] I should have mentioned that 3.11.0.266 is supposed to be the version that supports GPUs.

[edit] Thanks for all the suggestions. I'm now running H2O 3.13.0.337

I found this command also useful:

 sudo watch -n 0.1 'ps f -o user,pgrp,pid,pcpu,pmem,start,time,command -p `/usr/bin/lsof -n -w -t /dev/nvidia*`'

But, I'm a tad puzzled.

When I run XGBoost, I clearly see that the GPUs are very active 30 to 40% utilization (as well as all 8 of my CPU cores, which I guess must be managing the GPUs.) XGB finishes my classification problem in 20 seconds.

GLM runs pretty fast, so it's a little hard to tell if it's using my GPUs (done in less than a second. It does start clock in the STARTED column displayed by the ps program.

USER      PGRP   PID %CPU %MEM  STARTED     TIME COMMAND
user      3380  3380  116 12.0 10:52:56 04:36:36 /usr/local/anaconda2/bin/java -ea -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -jar /usr/local/anaconda2/lib/python2.7/site-packages/h

Distributed Random Forest starts the clock too, but doesn't seem to use any GPU processing but it does use all the CPU cores.

GBM is similar. It takes 1.5 minutes to train the same problem compared to 20sec for XGB. Since the algorithms are similar, I would have expected them to take the similar amount of time and use the GPUs in a similar way. I find this surprising.

I'm convinced that XGBoost is working the GPUs, but I'm not sure if any of the other algorithms are.

[added]

By way of comparison on H2O 3.13.0.341. Noticed the difference in temperature(!) and percentage GPU

Here's what gpustat -cup shows when I run xgboost :

[0] GeForce GTX 1080 | 64'C,  90 % |  1189 /  8105 MB | clem:java/31183(191M)

Here's what it shows when I run Distributed Random Forest (similar results occur for GBM and DeepLearning)

[0] GeForce GTX 1080 | 51'C,   5 % |  1187 /  8105 MB | clem:java/31183(189M)

You will need the GPU-enabled version of H2O, available on the H2O download page . It is not clear from your question if you are using regular H2O or GPU-enabled H2O, however if you are using GPU-enabled H2O and have the proper dependencies, it should see your GPUs. The current dependency list is:

  • Ubuntu 16.04
  • CUDA 8.0
  • cuDNN 5.1

I have opened a JIRA ticket to add some metadata in the h2o.init() printout so that you'll see information about your GPUs there (in a future release).

From a terminal window, run the nvidia-smi tool. Look at the utilization. If it's 0%, you're not using the GPUs.

In the example below, you can see Volatile GPU Utilization is 0%, so the GPUs are not being used.

$ nvidia-smi
Tue May 30 13:50:11 2017   
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 370.28                 Driver Version: 370.28                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 0000:02:00.0     Off |                  N/A |
| 27%   30C    P8    10W / 180W |      1MiB /  8113MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1080    Off  | 0000:03:00.0      On |                  N/A |
| 27%   31C    P8     9W / 180W |     38MiB /  8112MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    1      1599    G   /usr/lib/xorg/Xorg                              36MiB |
+-----------------------------------------------------------------------------+

I use the following handy little script to monitor GPU utilization for myself.

$ cat bin/gputop 
#!/bin/bash

watch -d -n 0.5 nvidia-smi

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM