简体   繁体   English

Evaluator 组件上的 TFX IndexError

[英]TFX IndexError on Evaluator component

I'm trying to make an Evaluator for my model.我正在尝试为我的 model 做一个评估器。 Until now every other components are fine but When I try this config:到目前为止,所有其他组件都很好,但是当我尝试这个配置时:

eval_config = tfma.EvalConfig(
    model_specs=[
        tfma.ModelSpec(label_key='Category'),
    ],
    metrics_specs=tfma.metrics.default_multi_class_classification_specs(),
    slicing_specs=[
        tfma.SlicingSpec(),
        tfma.SlicingSpec(feature_keys=['Category'])
    ])

to make this evaluator:使这个评估器:

model_resolver = ResolverNode(
      instance_name='latest_blessed_model_resolver',
      resolver_class=latest_blessed_model_resolver.LatestBlessedModelResolver,
      model=Channel(type=Model),
      model_blessing=Channel(type=ModelBlessing))
context.run(model_resolver)

evaluator = Evaluator(
    examples=example_gen.outputs['examples'],
    model=trainer.outputs['model'],
    baseline_model=model_resolver.outputs['model'],
    eval_config=eval_config)
context.run(evaluator)

I get this:我明白了:

[...]
IndexError                                Traceback (most recent call last)
/opt/miniconda3/envs/archiving/lib/python3.7/site-packages/apache_beam/runners/common.cpython-37m-darwin.so in apache_beam.runners.common.DoFnRunner.process()

/opt/miniconda3/envs/archiving/lib/python3.7/site-packages/apache_beam/runners/common.cpython-37m-darwin.so in apache_beam.runners.common.PerWindowInvoker.invoke_process()

/opt/miniconda3/envs/archiving/lib/python3.7/site-packages/apache_beam/runners/common.cpython-37m-darwin.so in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window()

/opt/miniconda3/envs/archiving/lib/python3.7/site-packages/apache_beam/runners/common.cpython-37m-darwin.so in apache_beam.runners.common._OutputProcessor.process_outputs()

/opt/miniconda3/envs/archiving/lib/python3.7/site-packages/apache_beam/runners/worker/operations.cpython-37m-darwin.so in apache_beam.runners.worker.operations.SingletonConsumerSet.receive()

/opt/miniconda3/envs/archiving/lib/python3.7/site-packages/apache_beam/runners/worker/operations.cpython-37m-darwin.so in apache_beam.runners.worker.operations.PGBKCVOperation.process()

/opt/miniconda3/envs/archiving/lib/python3.7/site-packages/apache_beam/runners/worker/operations.cpython-37m-darwin.so in apache_beam.runners.worker.operations.PGBKCVOperation.process()

/opt/miniconda3/envs/archiving/lib/python3.7/site-packages/tensorflow_model_analysis/evaluators/metrics_and_plots_evaluator_v2.py in add_input(self, accumulator, element)
    355     for i, (c, a) in enumerate(zip(self._combiners, accumulator)):
--> 356       result = c.add_input(a, get_combiner_input(elements[0], i))
    357       for e in elements[1:]:

/opt/miniconda3/envs/archiving/lib/python3.7/site-packages/tensorflow_model_analysis/metrics/calibration_histogram.py in add_input(self, accumulator, element)
    141             flatten=True,
--> 142             class_weights=self._class_weights)):
    143       example_weight = float(example_weight)

/opt/miniconda3/envs/archiving/lib/python3.7/site-packages/tensorflow_model_analysis/metrics/metric_util.py in to_label_prediction_example_weight(inputs, eval_config, model_name, output_name, sub_key, class_weights, flatten, squeeze, allow_none)
    283     elif sub_key.top_k is not None:
--> 284       label, prediction = select_top_k(sub_key.top_k, label, prediction)
    285 

/opt/miniconda3/envs/archiving/lib/python3.7/site-packages/tensorflow_model_analysis/metrics/metric_util.py in select_top_k(top_k, labels, predictions, scores)
    621   if not labels.shape or labels.shape[-1] == 1:
--> 622     labels = one_hot(labels, predictions)
    623 

/opt/miniconda3/envs/archiving/lib/python3.7/site-packages/tensorflow_model_analysis/metrics/metric_util.py in one_hot(tensor, target)
    671   # indexing the -1 and then removing it after.
--> 672   tensor = np.delete(np.eye(target.shape[-1] + 1)[tensor], -1, axis=-1)
    673   return tensor.reshape(target.shape)

IndexError: arrays used as indices must be of integer (or boolean) type

During handling of the above exception, another exception occurred:
[...]

IndexError: arrays used as indices must be of integer (or boolean) type [while running 'ExtractEvaluateAndWriteResults/ExtractAndEvaluate/EvaluateMetricsAndPlots/ComputeMetricsAndPlots()/ComputePerSlice/ComputeUnsampledMetrics/CombinePerSliceKey/WindowIntoDiscarding']

I thought it was my config, but I don't get what is wrong with this.我以为这是我的配置,但我不明白这有什么问题。

I'm using this data set Kaggle - BBC News Classification .我正在使用这个数据集Kaggle - BBC News Classification I've followed this notebook: TFX - Chicago Taxi in order to serve my model with Tensorflow Serving.我关注了这个笔记本: TFX - Chicago Taxi ,以便为我的 model 和 Tensorflow 服务。

Note: The model I'm using look like this:注意:我使用的 model 如下所示:

def _build_keras_model(vectorize_layer: TextVectorization) -> tf.keras.Model: 

  input_layer = tf.keras.layers.Input(shape=(1,), dtype=tf.string)

  deep = vectorize_layer(input_layer)
  deep = layers.Embedding(_max_features + 1, _embedding_dim)(deep)
  deep = layers.Dropout(0.5)(deep)
  deep = layers.GlobalAveragePooling1D()(deep)
  deep = layers.Dropout(0.5)(deep)

  output = layers.Dense(5, activation=tf.nn.softmax)(deep)

  model = tf.keras.Model(input_layer, output)
  model.compile(
      loss=losses.SparseCategoricalCrossentropy(from_logits=True),
      optimizer='adam', 
      metrics=['accuracy'])
  model.summary(print_fn=absl.logging.info)  
  return model

I got it to work.我让它工作。 My problem was that in the data set the label (the document category) is in a string format (eg: "sport", "business",...).我的问题是,在数据集中,label(文档类别)是字符串格式(例如:“sport”、“business”、...)。 So to encode it as an integer I used the Transform component to preprocess it.因此,为了将其编码为 integer,我使用了 Transform 组件对其进行预处理。

However, when building the evaluator component I passed the ExampleGen component where no processing were done on the data.但是,在构建评估器组件时,我传递了未对数据进行任何处理的 ExampleGen 组件。 So the evaluator was trying to cast the string from the ExampleGen to match the integer output from the model.因此,评估者试图从 ExampleGen 中转换字符串以匹配来自 model 的 integer output。

So, to fix this I simply did this:所以,为了解决这个问题,我只是这样做了:

model_resolver = ResolverNode(
      instance_name='latest_blessed_model_resolver',
      resolver_class=latest_blessed_model_resolver.LatestBlessedModelResolver,
      model=Channel(type=Model),
      model_blessing=Channel(type=ModelBlessing))
context.run(model_resolver)

evaluator = Evaluator(
    examples=transform.outputs['transformed_examples'],
    model=trainer.outputs['model'],
    baseline_model=model_resolver.outputs['model'],
    eval_config=eval_config)
context.run(evaluator)

I used the examples from the transform component.我使用了转换组件中的示例。 Of course I also changed the label key in the config to match the label name of the transform component.当然,我还更改了配置中的 label 键以匹配变换组件的 label 名称。

I don't know if there is a 'cleaner' way to perform this (or if I'm doing this all wrong please correct me!)我不知道是否有一种“更清洁”的方式来执行此操作(或者如果我做错了,请纠正我!)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM