简体   繁体   English

谷歌云数据流(Python):function 读取和写入 a.csv 文件?

[英]Google Cloud Dataflow (Python): function to read from and write to a .csv file?

I am not able to figure out the precise functions in GCP Dataflow Python SDK that read from and write to csv files (or any non-txt files for that matter).我无法弄清楚 GCP 数据流 Python SDK 中读取和写入 csv 文件(或与此相关的任何非 txt 文件)的精确函数。 For BigQuery, I have figured out the following functions:对于 BigQuery,我已经弄清楚了以下功能:

beam.io.Read(beam.io.BigQuerySource('%Table_ID%')) beam.io.Write(beam.io.BigQuerySink('%Table_ID%')) beam.io.Read(beam.io.BigQuerySource('%Table_ID%')) beam.io.Write(beam.io.BigQuerySink('%Table_ID%'))

For reading textfiles, the ReadFromText and WriteToText functions are known to me.对于读取文本文件,我知道 ReadFromText 和 WriteToText 函数。

However, I am not able to find any examples for GCP Dataflow Python SDK in which data is written to or read from csv files.但是,我无法找到 GCP 数据流 Python SDK 的任何示例,其中数据写入或读取 csv 文件。 Please could you provide the GCP Dataflow Python SDK functions for reading from and writing to csv files in the same manner as I have done for the functions relating to BigQuery above?请提供 GCP 数据流 Python SDK 函数,以便按照我为上述 BigQuery 相关函数所做的相同方式读取和写入 csv 文件?

There is a CsvFileSource in the beam_utils PyPi package repository, that reads.csv files, deals with file headers, and can set custom delimiters. beam_utils PyPi package 存储库中有一个CsvFileSource ,它读取 .csv 个文件,处理文件头,并可以设置自定义分隔符。 More information on how to use this source in this answer .有关如何在此答案中使用此来源的更多信息。 Hope that helps!希望有帮助!

CSV files are text files. CSV 文件是文本文件。 The simplest (though somewhat inelegant) way of reading them would be to do a ReadFromText , and then split the lines read on the commas (eg beam.Map(lambda x: x.split(',')) ).阅读它们的最简单(尽管有些不雅)的方法是执行ReadFromText ,然后拆分以逗号读取的行(例如beam.Map(lambda x: x.split(',')) )。

For the more elegant option, check out this question , or simply use the beam_utils pip repository and use the beam_utils.sources.CsvFileSource source to read from.对于更优雅的选项,请查看此问题,或者简单地使用beam_utils pip 存储库并使用beam_utils.sources.CsvFileSource源来读取。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在特定文件集从谷歌云到达云存储时启动云数据流管道 function - how to launch a cloud dataflow pipeline when particular set of files reaches Cloud storage from a google cloud function 从 GCS 读取 CSV 到 class 数据流 Java - Read CSV to a class Dataflow Java from GCS 将 csv 写入谷歌云存储 - Write csv to google cloud storage 如何只从谷歌云存储中读取 csv 的第一行? - How to read only first row of csv from Google Cloud Storage? 如何从 Java 中的 Cloud Function 触发 Cloud Dataflow 管道作业? - How to trigger Cloud Dataflow pipeline job from Cloud Function in Java? 如何使用自定义 Docker 图像运行 Python Google Cloud Dataflow 作业? - How to run a Python Google Cloud Dataflow job with a custom Docker image? 如何从谷歌数据流 apache 光束 python 中的 GCS 存储桶中读取多个 JSON 文件 - How to read multiple JSON files from GCS bucket in google dataflow apache beam python 使用 Python 和 SQLAlchemy 从谷歌云 Function 连接到云 SQL - Connecting to Cloud SQL from Google Cloud Function using Python and SQLAlchemy 从 Google Cloud Storage 加载 csv 文件时出现 BigQuery 错误 - BigQuery error when loading csv file from Google Cloud Storage 在 Apache Beam/Google Cloud Dataflow 上创建文件和数据流 - Creating a file and streaming in data on Apache Beam/Google Cloud Dataflow
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM