GPU Google Cloud ML引擎进行慢速训练

Question

sorry if my question is so dump but i spent a lot of time trying to understand the reason of the problem but i couldn't so here it is 抱歉，如果我的问题这么麻烦，但是我花了很多时间试图理解问题的原因，但是我不能在这里

i'm training tacotron model on google cloud ML i have trained it before on floyd hub and it was pretty fast so i configured my project to be able to run on google ML 我正在Google Cloud ML上训练tacotron模型，之前我已经在floyd hub上对其进行了训练，而且速度非常快，所以我将我的项目配置为能够在google ML上运行

this is the major changes that i made to my project 这是我对项目所做的重大更改

original 原版的

with open(metadata_filename, encoding='utf-8') as f:
  self._metadata = [line.strip().split('|') for line in f]
  hours = sum((int(x[2]) for x in self._metadata)) * hparams.frame_shift_ms / (3600 * 1000)
  log('Loaded metadata for %d examples (%.2f hours)' % (len(self._metadata), hours))

my config 我的配置

with file_io.FileIO(metadata_filename, 'r') as f:
     self._metadata = [line.strip().split('|') for line in f]
     hours = sum((int(x[2]) for x in self._metadata)) * hparams.frame_shift_ms / (3600 * 1000)
     log('Loaded metadata for %d examples (%.2f hours)' % (len(self._metadata), hours))

original 原版的

def _get_next_example(self):
    '''Loads a single example (input, mel_target, linear_target, cost) from disk'''
    if self._offset >= len(self._metadata):
      self._offset = 0
      random.shuffle(self._metadata)
    meta = self._metadata[self._offset]
    self._offset += 1

    text = meta[3]
    if self._cmudict and random.random() < _p_cmudict:
      text = ' '.join([self._maybe_get_arpabet(word) for word in text.split(' ')])

    input_data = np.asarray(text_to_sequence(text, self._cleaner_names), dtype=np.int32)
    linear_target = np.load(os.path.join(self._datadir, meta[0]))
    mel_target = np.load(os.path.join(self._datadir, meta[1]))
    return (input_data, mel_target, linear_target, len(linear_target))

my config 我的配置

 def _get_next_example(self):

    '''Loads a single example (input, mel_target, linear_target, cost) from disk'''
    if self._offset >= len(self._metadata):
        self._offset = 0
        random.shuffle(self._metadata)
    meta = self._metadata[self._offset]
    self._offset += 1

    text = meta[3]
    if self._cmudict and random.random() < _p_cmudict:
        text = ' '.join([self._maybe_get_arpabet(word) for word in text.split(' ')])

    input_data = np.asarray(text_to_sequence(text, self._cleaner_names), dtype=np.int32)
    f = BytesIO(file_io.read_file_to_string(
        os.path.join(self._datadir, meta[0]),binary_mode=True))
    linear_target = np.load(f)
    s = BytesIO(file_io.read_file_to_string(
        os.path.join(self._datadir, meta[1]),binary_mode = True))
    mel_target = np.load(s)
    return (input_data, mel_target, linear_target, len(linear_target))

here 2 screen shots to show the difference Google ML , FLoydhub 这里有2个截屏，以显示Google ML和FLoydhub的区别

and this is the training command i use in google ML i use scale-tier=BASIC_GPU gcloud ml-engine jobs submit training "$JOB_NAME" --stream-logs --module-name trainier.train --package-path trainier --staging-bucket "$BUCKET_NAME" --region "us-central1" --scale-tier=basic-gpu --config ~/gp-master/config.yaml --runtime-version=1.4 -- --base_dir "$BASEE_DIR" --input "$TRAIN_DATA" 这是我在Google ML中使用的训练命令，我使用scale-tier = BASIC_GPU gcloud ml-engine jobs submit training "$JOB_NAME" --stream-logs --module-name trainier.train --package-path trainier --staging-bucket "$BUCKET_NAME" --region "us-central1" --scale-tier=basic-gpu --config ~/gp-master/config.yaml --runtime-version=1.4 -- --base_dir "$BASEE_DIR" --input "$TRAIN_DATA"

So my question is did i do something that could cause this slow reading data maybe or there is problem in google cloud ML and i doubt that ?? 所以我的问题是我是否做了可能导致数据读取缓慢的操作，或者在Google Cloud ML中存在问题，我对此表示怀疑？

Answer 1

好吧，我弄清楚我应该在需要的软件包中放置tensorflow-gpu == 1.4而不是tensorflow == 1.4 ^^

GPU Google Cloud ML引擎进行慢速训练

问题描述

1 个解决方案

解决方案1
3 已采纳 2018-03-08 14:16:05

GPU Google Cloud ML引擎进行慢速训练

问题描述

1 个解决方案

解决方案1 3 已采纳 2018-03-08 14:16:05

解决方案1
3 已采纳 2018-03-08 14:16:05