簡體   English   中英

多gpu模式下的tensorflow c ++ SetDefaultDevice

[英]tensorflow c++ SetDefaultDevice in multi-gpu mode

我想在多個 GPU 上加載相同的圖形以進行推理,但是我無法使用 graph::SetDefaultDevice 將圖形與設備相關聯。 問題不是在 SetDefaultDevice 中出現,而是在稍后使用圖形創建會話時出現。 這是從tensorflow的example_trainer.cc中提取的一個簡單示例

#include <tensorflow/core/platform/env.h>
#include <tensorflow/core/public/session.h>
#include "tensorflow/cc/ops/standard_ops.h"
#include "tensorflow/core/graph/default_device.h"

int main() {
  using namespace tensorflow;
  using namespace tensorflow::ops;
  Scope root = Scope::NewRootScope();
  auto A = Const(root, { {3.f, 2.f}, {-1.f, 0.f} }); 
  auto b = Const(root, { {3.f, 5.f} }); 
  auto v = MatMul(root.WithOpName("v"), A, b, MatMul::TransposeB(true));

  GraphDef def;
  TF_CHECK_OK(root.ToGraphDef(&def));

  graph::SetDefaultDevice(false ? "/device:GPU:0" : "/cpu:0", &def);
  /*
  for (auto &node: *def.mutable_node()) {
        node.set_device("/cpu:0");
        std::cout << node.name() << " = '" << node.device() <<"'"<< std::endl;
  }
  std::cout << "=======================\n";
  */
  SessionOptions options;
  std::unique_ptr<Session> session(NewSession(options));
  TF_CHECK_OK(session->Create(def));
  return 0;
}

運行時出現以下錯誤

2018-09-06 18:18:13.853316: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-09-06 18:18:13.856079: F /home/daniel/tensorflow_cc/example/example.cpp:27] Non-OK-status: session->Create(def) status: Not found: No attr named '/cpu:0' in NodeDef:
     [[Node: Const = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [2,2] values: [3 2][-1]...>, _device="/cpu:0"]()]]
     [[Node: Const = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [2,2] values: [3 2][-1]...>, _device="/cpu:0"]()]]
Aborted (core dumped)

如果我刪除 SetDefault Device 調用,它就可以完美運行。 我也嘗試在帶有 GPU 的機器上執行此操作,但沒有成功。

我知道問題不在於 SetDefaultDevice,因為手動設置每個節點的設備在創建會話時最終會遇到相同的問題。

Const = '/cpu:0'
Const_1 = '/cpu:0'
v = '/cpu:0'
=======================
2018-09-06 18:15:05.966337: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-09-06 18:15:05.969048: F /home/daniel/tensorflow_cc/example/example.cpp:26] Non-OK-status: session->Create(def) status: Not found: No attr named '/cpu:0' in NodeDef:
     [[Node: Const = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [2,2] values: [3 2][-1]...>, _device="/cpu:0"]()]]
     [[Node: Const = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [2,2] values: [3 2][-1]...>, _device="/cpu:0"]()]]
Aborted (core dumped)

這似乎只是單體構建(--config=monolithic)的問題,即構建 libtensorflow_cc.so 時。 我不確定,但它可能與

https://github.com/tensorflow/tensorflow/issues/5379 https://github.com/tensorflow/tensorflow/issues/16291

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM