應用TensorFlow Transform來轉換/縮放生產中的要素

Question

概觀

我按照以下指南編寫了TF Records，我使用tf.Transform來預處理我的功能。 現在，我想部署我的模型，我需要在真實的實時數據上應用這個預處理功能。

我的方法

首先，假設我有2個功能：

features = ['amount', 'age']

我有來自Apache Beam的transform_fn ，駐留在working_dir=gs://path-to-transform-fn/

然后我使用以下方法加載轉換函數：

tf_transform_output = tft.TFTransformOutput(working_dir)

我認為在生產中服務的最簡單方法是獲取一個處理數據的numpy數組，並調用model.predict() （我使用的是model.predict()模型）。

為此，我認為transform_raw_features()方法正是我所需要的。

但是，似乎在構建架構之后：

raw_features = {}
for k in features:
    raw_features.update({k: tf.constant(1)})

print(tf_transform_output.transform_raw_features(raw_features))

我明白了：

AttributeError: 'Tensor' object has no attribute 'indices'

現在，我假設發生了這種情況，因為我在preprocessing_fn定義了schema時使用了tf.VarLenFeature() 。

def preprocessing_fn(inputs):
    outputs = inputs.copy()

    for _ in features:
        outputs[_] = tft.scale_to_z_score(outputs[_])

我使用以下方法構建元數據：

RAW_DATA_FEATURE_SPEC = {}
for _ in features:
    RAW_DATA_FEATURE_SPEC[_] = tf.VarLenFeature(dtype=tf.float32)
    RAW_DATA_METADATA = dataset_metadata.DatasetMetadata(
    dataset_schema.from_feature_spec(RAW_DATA_FEATURE_SPEC))

所以簡而言之，給一本字典：

d = {'amount': [50], 'age': [32]} ，我想應用此transform_fn ，並適當縮放這些值以輸入到我的模型中進行預測。 在pre_processing()函數處理數據之前，這個字典正是我的PCollection的格式。

管道結構：

class BeamProccess():

def __init__(self):

    # init 

    self.run()


def run(self):

    def preprocessing_fn(inputs):

         # outputs = { 'id' : [list], 'amount': [list], 'age': [list] }
         return outputs

    with beam.Pipeline(options=self.pipe_opt) as p:
        with beam_impl.Context(temp_dir=self.google_cloud_options.temp_location):
            data = p | "read_table" >> beam.io.Read(table_bq) \
            | "create_data" >> beam.ParDo(ProcessFn())

            transformed_dataset, transform_fn = (
                        (train, RAW_DATA_METADATA) | beam_impl.AnalyzeAndTransformDataset(
                    preprocessing_fn))

            transformed_data, transformed_metadata = transformed_dataset

            transformed_data | "WriteTrainTFRecords" >> tfrecordio.WriteToTFRecord(
                    file_path_prefix=self.JOB_DIR + '/train/data',
                    file_name_suffix='.tfrecord',
                    coder=example_proto_coder.ExampleProtoCoder(transformed_metadata.schema))

            _ = (
                        transform_fn
                        | 'WriteTransformFn' >>
                        transform_fn_io.WriteTransformFn(path=self.JOB_DIR + '/transform/'))

最后ParDo()是：

class ProcessFn(beam.DoFn):

    def process(self, element):

        yield { 'id' : [list], 'amount': [list], 'age': [list] }

Answer 1

問題在於代碼段

raw_features = {}
for k in features:
    raw_features.update({k: tf.constant(1)})

print(tf_transform_output.transform_raw_features(raw_features))

在此代碼中，您構造了一個字典，其中值是張量。 就像你說的，這不適用於VarLenFeature 。 而不是使用tf.constant嘗試使用tf.placeholder為AA FixedLenFeature和tf.sparse_placeholder的VarLenFeature 。

應用TensorFlow Transform來轉換/縮放生產中的要素

問題描述

概觀

我的方法

管道結構：

1 個解決方案

解決方案1
7 2019-01-25 20:25:23

應用TensorFlow Transform來轉換/縮放生產中的要素

問題描述

概觀

我的方法

管道結構：

1 個解決方案

解決方案1 7 2019-01-25 20:25:23

解決方案1
7 2019-01-25 20:25:23