使用 Tensorflow Extended 时，如何使用本地 CSV 文件运行我的 apache 光束管道？

Question

在将 csv 数据传递给 TFT 之前必须对其进行解码，而 Beams TextIO 不能独立工作，那么它是如何工作的呢？

Answer 1

def beam():   
output_path = "./output"
options = PipelineOptions()
options.view_as(StandardOptions).runner = "DirectRunner"

with beam.Pipeline(options=options) as pipeline:
    
    with tft_beam.Context(temp_dir="./tmp"):
        
        converter = tft.coders.CsvCoder(CSV, raw_metadata.schema)
        raw_data = (
              pipeline
                | 'ReadTrainData' >> beam.io.ReadFromText(input_path,skip_header_lines=1)
                | 'FixCommasTrainData' >> beam.Map(
                      lambda line: line.replace(', ', ','))
                | 'DecodeTrainData' >> MapAndFilterErrors(converter.decode))

raw_metadata = dataset_metadata.DatasetMetadata(schema_utils.schema_from_feature_spec(feature_specs))

See more in the official tfx guide. 在官方 tfx 指南中查看更多信息。

使用 Tensorflow Extended 时，如何使用本地 CSV 文件运行我的 apache 光束管道？

问题描述

1 个解决方案

解决方案1
0 2020-09-25 15:53:33

使用 Tensorflow Extended 时，如何使用本地 CSV 文件运行我的 apache 光束管道？

问题描述

1 个解决方案

解决方案1 0 2020-09-25 15:53:33

解决方案1
0 2020-09-25 15:53:33