使用 Apache Beam (Python) 解压缩文件，但在使用 WriteToText 时，它将所有列作为行

Question

I am very new to programming and Apache Beam, and I am trying to read plenty zip files on aa GCS bucket and unzip them and save again as csv on GCS.我对编程和 Apache Beam 非常陌生，我正在尝试在 GCS 存储桶上读取大量 zip 文件并将它们解压缩并再次保存为 GCS 上的 csv。

with beam.Pipeline() as pipeline:
readable_files = (
  pipeline
  | beam.io.fileio.MatchFiles('path/file/patter*.zip')
  | beam.io.fileio.ReadMatches()
  | beam.FlatMap(unzip)
  | beam.combiners.ToList())
files_and_contents = (
  readable_files  
  | beam.io.WriteToText('new', file_name_suffix='.csv'))

An I am unzipping the files with this function我正在使用此功能解压缩文件

def unzip(readable_file):
print(readable_file)
input_zip=zipfile.ZipFile(readable_file.open())
yield {name: input_zip.read(name) for name in input_zip.namelist()}

I have tested it with two files only, and all lines were written as columns, here is an example.我仅用两个文件对其进行了测试，并且所有行都写为列，这是一个示例。 The header is a column, and all the other lines columns.标题是一列，所有其他行都是列。

CSV file saved已保存 CSV 文件

Answer 1

在 beam.io.file io.ReadMatches() 内尝试添加 skip_header_lines=1

使用 Apache Beam (Python) 解压缩文件，但在使用 WriteToText 时，它将所有列作为行

问题描述

1 个解决方案

解决方案1
0 2022-05-19 07:42:58

使用 Apache Beam (Python) 解压缩文件，但在使用 WriteToText 时，它将所有列作为行

问题描述

1 个解决方案

解决方案1 0 2022-05-19 07:42:58

解决方案1
0 2022-05-19 07:42:58