简体   繁体   English

在 TensorFlow 中使用多个 GPU 来推断 pb 模型

[英]Use multiple GPUs in TensorFlow to inference with pb model

I use a server with 8 Titan X, trying to predict image much faster than use a single one GPU.我使用带有 8 个 Titan X 的服务器,试图比使用单个 GPU 更快地预测图像。 I load the PB model like this:我像这样加载 PB 模型:

model_dir = "./model"
    model = "nasnet_large_v1.pb"
    model_path = os.path.join(model_dir, model)
    model_graph = tf.Graph()
    with model_graph.as_default():
        with tf.gfile.GFile(model_path, 'rb') as f:
            graph_def = tf.GraphDef()
            graph_def.ParseFromString(f.read())
            _ = tf.import_graph_def(graph_def, name='')
            input_layer = model_graph.get_tensor_by_name("input:0")
            output_layer = model_graph.get_tensor_by_name('final_layer/predictions:0')

then I start iterating files in the ./data_input directory like this:然后我开始迭代./data_input目录中的文件,如下所示:

with tf.Session(graph = model_graph, config=config) as inference_session:
        # Initialize session
        initializer = np.zeros([1, 331, 331, 3])
        print("Initialing session...")
        inference_session.run(output_layer, feed_dict={input_layer: initializer})
        print("Done initialing.")

        # Prediction
        file_list = []
        processed_files = []

        for path, dir, files in os.walk('./model_output/processed_files'):
            for file in files:
                processed_files.append(file.split('_')[0]+'.tfrecord')

        print("Processed files: ")
        for f in processed_files:
            print('\t', f)

        while True:
            for path, dir, files in os.walk("./data_input"):
                for file in files:
                    if file == '.DS_Store': continue
                    if file in processed_files: continue
                    print("Reading file {}".format(file))
                    file_path = os.path.join('./data_input', file)
                    file_list.append(file_path)
                    res = predict(file_path)
                    processed_files.append(file)

                    with open('./model_output/processed_files/{}_{}_processed_files.json'.format(file.split('.')[0], model.split('.')[0]), 'w') as f:
                        f.write(json.dumps(processed_files))

                    with open('./model_output/classify_result/{}_{}_classify_result.json'.format(file.split('.')[0], model.split('.')[0]), 'w') as f:
                        f.write(json.dumps(res, indent=4, separators=(',',':')))

            time.sleep(1)

in the predict() function, I wrote code like this:predict()函数中,我写了这样的代码:

label_map = get_label()
    # read tfrecord file by tf.data
    dataset = get_dataset(filename)
    # dataset.apply(tf.contrib.data.prefetch_to_device("/gpu:0"))
    # load data
    iterator = dataset.make_one_shot_iterator()
    features = iterator.get_next()

    result = []
    content = {}
    count = 0
    # session
    with tf.Session() as sess:
        tf.global_variables_initializer()
        t1 = time.time()
        try:
            while True:
                [_image, _label, _filepath] = sess.run(fetches=features)
                _image = np.asarray([_image])
                _image = _image.reshape(-1, 331, 331, 3)

                predictions = inference_session.run(output_layer, feed_dict={input_layer: _image})
                predictions = np.squeeze(predictions)

                # res = []
                for i, pred in enumerate(predictions):
                    count += 1
                    overall_result = np.argmax(pred)
                    predict_result = label_map[overall_result].split(":")[-1]

                    if predict_result == 'unknown': continue

                    content['prob'] = str(np.max(pred))
                    content['label'] = predict_result
                    content['filepath'] = str(_filepath[i], encoding='utf-8')
                    result.append(content)

        except tf.errors.OutOfRangeError:
            t2 = time.time()
            print("{} images processed, average time: {}s".format(count, (t2-t1)/count))
    return result

I tried to use with tf.device('/gpu:{}'.format(i)) in the loading model part or the inference session part or the session part, nvidia-smi shows that only GPU0 is use to 100%, while other GPUs doesn't seem to work even when the memory is load.我尝试在加载模型部分或推理会话部分或会话部分中with tf.device('/gpu:{}'.format(i))一起使用, nvidia-smi显示只有 GPU0 被使用到 100%,而其他 GPU 似乎即使在加载内存时也无法工作。

What should I do to make all the GPUs run at the same time to boost predicting speed?我应该怎么做才能让所有 GPU 同时运行以提高预测速度?

My code is under https://github.com/tzattack/image_classification_algorithms .我的代码在https://github.com/tzattack/image_classification_algorithms下。

you can force the device for each node in the graph by: 您可以通过以下方式为图中的每个节点强制使用设备:

def load_network(graph, i):
    od_graph_def = tf.GraphDef()
    with tf.gfile.GFile(graph, 'rb') as fid:
        serialized_graph = fid.read()
    od_graph_def.ParseFromString(serialized_graph)
    for node in od_graph_def.node:
        node.device = '/gpu:{}'.format(i) if i >= 0 else '/cpu:0'
    return {"od_graph_def": od_graph_def}

then you can merge the multiple grapgh you got (per gpu) to one 那么您可以将每gpu的多个graggh合并为一个
also change the tensor names in case you are using the same model for all gpus 如果所有gpu都使用相同的模型,还可以更改张量名称
and run all in one sessoin 并全部归入芝麻

works perfectly for me 最适合我

It can be done as follows:可以按如下方式完成:

def get_frozen_graph(graph_file):
    """Read Frozen Graph file from disk."""
    with tf.gfile.GFile(graph_file, "rb") as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())
    return graph_def

trt_graph1 = get_frozen_graph('/home/ved/ved_1/frozen_inference_graph.pb')

with tf.device('/gpu:1'):
    [tf_input_l1, tf_scores_l1, tf_boxes_l1, tf_classes_l1, tf_num_detections_l1, tf_masks_l1] = tf.import_graph_def(trt_graph1, 
                    return_elements=['image_tensor:0', 'detection_scores:0', 
                    'detection_boxes:0', 'detection_classes:0','num_detections:0', 'detection_masks:0'])
    
tf_sess1 = tf.Session(config=tf.ConfigProto(allow_soft_placement=True))

trt_graph2 = get_frozen_graph('/home/ved/ved_2/frozen_inference_graph.pb')

with tf.device('/gpu:0'):
    [tf_input_l2, tf_scores_l2, tf_boxes_l2, tf_classes_l2, tf_num_detections_l2] = tf.import_graph_def(trt_graph2, 
                    return_elements=['image_tensor:0', 'detection_scores:0', 
                    'detection_boxes:0', 'detection_classes:0','num_detections:0'])
    
tf_sess2 = tf.Session(config=tf.ConfigProto(allow_soft_placement=True))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM