简体   繁体   English

Tensorflow Object 检测 API - 高 RAM/CPU 使用率 - 无 GPU 使用率

[英]Tensorflow Object Detection API - High RAM/CPU usage - no GPU usage

I'm playing around with the tensorflow Object Detection API, and I'm having some issues with the training of models.我在玩 tensorflow Object 检测 API,我在训练模型时遇到了一些问题。 In particular, the CPU and Ram usage is very high, while the GPU is basically not used at all (according to Windows task manager):特别是CPU和Ram使用率非常高,而GPU基本没有用到(根据Windows任务管理器):

在此处输入图像描述

I have installed the TF object detection API according to this guide , and I have verified that the GPU is successfully recognised:我按照这个指南安装了TF object detection API,验证成功识别GPU:

python -c "import tensorflow as tf;print(tf.reduce_sum(tf.random.normal([1000, 1000])))"

2021-07-20 15:36:37.630320: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2021-07-20 15:36:49.683811: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll
2021-07-20 15:36:49.990907: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3080 Laptop GPU computeCapability: 8.6
coreClock: 1.605GHz coreCount: 48 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-07-20 15:36:50.017685: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2021-07-20 15:36:50.142257: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
2021-07-20 15:36:50.158525: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
2021-07-20 15:36:50.173970: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll
2021-07-20 15:36:50.183516: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll
2021-07-20 15:36:50.196516: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll
2021-07-20 15:36:50.213625: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll
2021-07-20 15:36:50.231417: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll
2021-07-20 15:36:50.234253: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-20 15:36:50.238133: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-07-20 15:36:50.245602: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3080 Laptop GPU computeCapability: 8.6
coreClock: 1.605GHz coreCount: 48 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-07-20 15:36:50.265550: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-20 15:36:54.162700: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-20 15:36:54.168506: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0
2021-07-20 15:36:54.169910: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N
2021-07-20 15:36:54.176538: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5177 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3080 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6)
tf.Tensor(1222.1837, shape=(), dtype=float32)

EDIT: I have had this problem only with cente.net_hg104_512x512_coco17_tpu-8 (where I am using the pipeline.config file shown below), while other models (ssd_re.net or efficientdet) actually use the gpu.编辑:我只在 cente.net_hg104_512x512_coco17_tpu-8 上遇到过这个问题(我在其中使用如下所示的 pipeline.config 文件),而其他模型(ssd_re.net 或 efficientdet)实际上使用 gpu。

model {
  center_net {
    num_classes: 1
    feature_extractor {
      type: "hourglass_104"
      channel_means: 104.01361846923828
      channel_means: 114.03422546386719
      channel_means: 119.91659545898438
      channel_stds: 73.60276794433594
      channel_stds: 69.89082336425781
      channel_stds: 70.91507720947266
      bgr_ordering: true
    }
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 512
        max_dimension: 512
        pad_to_max_dimension: true
      }
    }
    object_detection_task {
      task_loss_weight: 1.0
      offset_loss_weight: 1.0
      scale_loss_weight: 0.10000000149011612
      localization_loss {
        l1_localization_loss {
        }
      }
    }
    object_center_params {
      object_center_loss_weight: 1.0
      classification_loss {
        penalty_reduced_logistic_focal_loss {
          alpha: 2.0
          beta: 4.0
        }
      }
      min_box_overlap_iou: 0.6
      max_box_predictions: 50
    }
  }
}
train_config {
  batch_size: 2
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    random_crop_image {
      min_aspect_ratio: 0.5
      max_aspect_ratio: 1.7000000476837158
      random_coef: 0.25
    }
  }
  data_augmentation_options {
    random_adjust_hue {
    }
  }
  data_augmentation_options {
    random_adjust_contrast {
    }
  }
  data_augmentation_options {
    random_adjust_saturation {
    }
  }
  data_augmentation_options {
    random_adjust_brightness {
    }
  }
  data_augmentation_options {
    random_absolute_pad_image {
      max_height_padding: 200
      max_width_padding: 200
      pad_color: 0.0
      pad_color: 0.0
      pad_color: 0.0
    }
  }
  optimizer {
    adam_optimizer {
      learning_rate {
        manual_step_learning_rate {
          initial_learning_rate: 0.0010000000474974513
          schedule {
            step: 1000
            learning_rate: 9.999999747378752e-05
          }
          schedule {
            step: 5000
            learning_rate: 9.999999747378752e-06
          }
        }
      }
      epsilon: 1.0000000116860974e-07
    }
    use_moving_average: false
  }
  fine_tune_checkpoint: "pre-trained-models/centernet_hg104_512x512_coco17_tpu-8/checkpoint/ckpt-0"
  num_steps: 5000
  max_number_of_boxes: 50
  unpad_groundtruth_tensors: false
  fine_tune_checkpoint_type: "detection"
  fine_tune_checkpoint_version: V2
}
train_input_reader {
  label_map_path: "annotations/label_map.pbtxt"
  tf_record_input_reader {
    input_path: "annotations/train.record"
  }
}
eval_config {
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false
  batch_size: 1
}
eval_input_reader {
  label_map_path: "annotations/label_map.pbtxt"
  shuffle: false
  num_epochs: 1
  tf_record_input_reader {
    input_path: "annotations/test.record"
  }
}

The small batch_size of 2 is needed because using any larger number makes the script for training the model get stack for a bit and then quit (this is also quite surprising to me, but I have just started playing around with this stuff so maybe it's actually normal).需要 2 的小 batch_size,因为使用任何更大的数字都会使用于训练 model 的脚本得到一点堆栈然后退出(这对我来说也很令人惊讶,但我刚刚开始玩这个东西所以也许它实际上是普通的)。

I'm using:我在用着:

Windows 10 Windows 10

CPU: i9-10980HK中央处理器:i9-10980HK

ram: 32GB内存:32GB

GPU: GTX3080 8GB dedicated memory GPU:GTX3080 8GB专用 memory

tensorflow = 2.5 tensorflow = 2.5

CUDA = 11.3.1 CUDA = 11.3.1

cuDNN = 8.2.1.32 cuDNN = 8.2.1.32

Is this low GPU/high CPU usage expected?这种低 GPU/高 CPU 使用率是预期的吗? Am I missing something here?我在这里错过了什么吗? Thanks for the help, and please let me know if I can provide any other useful info.感谢您的帮助,如果我能提供任何其他有用的信息,请告诉我。

In \research\object_detection\legacy\train.py在 \research\object_detection\legacy\train.py

add the below snippet添加以下代码段

import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"

Also faced same issue and it worked for me.也面临同样的问题,它对我有用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM