[英]How can i run my apache beam pipeline with a local CSV-File when using Tensorflow Extended?
在将 csv 数据传递给 TFT 之前必须对其进行解码,而 Beams TextIO 不能独立工作,那么它是如何工作的呢?
def beam():
output_path = "./output"
options = PipelineOptions()
options.view_as(StandardOptions).runner = "DirectRunner"
with beam.Pipeline(options=options) as pipeline:
with tft_beam.Context(temp_dir="./tmp"):
converter = tft.coders.CsvCoder(CSV, raw_metadata.schema)
raw_data = (
pipeline
| 'ReadTrainData' >> beam.io.ReadFromText(input_path,skip_header_lines=1)
| 'FixCommasTrainData' >> beam.Map(
lambda line: line.replace(', ', ','))
| 'DecodeTrainData' >> MapAndFilterErrors(converter.decode))
raw_metadata = dataset_metadata.DatasetMetadata(schema_utils.schema_from_feature_spec(feature_specs))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.