简体   繁体   English

如何从BigQuery导出数据并将其作为.csv存储在Google存储空间中

[英]How to export data from BigQuery and store it as .csv in Google Storage

How do i extract the data from the table using the pipeline and store it as csv in the GS? 如何使用管道从表中提取数据并将其作为csv存储在GS中? So far i've only been able to extract the data in a simple text format by extracting each field, concatenating it to a string and then outputting it. 到目前为止,我只能通过提取每个字段,将其连接为字符串然后输出来以简单文本格式提取数据。

Does anyone know a method for this? 有人知道这种方法吗? Thanks. 谢谢。

Reading from Bigquery using - BigQuery I/O 使用-BigQuery I / O从Bigquery读取

To read from a BigQuery table, you apply a BigQueryIO.Read transform. 要从BigQuery表中进行读取,请应用BigQueryIO.Read转换。 BigQueryIO.Read returns a PCollection of BigQuery TableRow objects, where each element in the PCollection represents a single row in the table. BigQueryIO.Read返回BigQuery TableRow对象的PCollection,其中PCollection中的每个元素代表表中的一行。

You can read an entire BigQuery table by supplying the BigQuery table name to > BigQueryIO.Read by using the .from operation. 您可以通过向> BigQueryIO提供BigQuery表名来读取整个BigQuery表。使用.from操作读取。 The following example code shows > how to apply the BigQueryIO.Read transform to read an entire BigQuery table: 以下示例代码显示>如何应用BigQueryIO.Read转换以读取整个BigQuery表:

PipelineOptions options = PipelineOptionsFactory.create(); PipelineOptions选项= PipelineOptionsFactory.create(); Pipeline p = Pipeline.create(options); 管道p = Pipeline.create(options);

PCollection weatherData = p.apply( BigQueryIO.Read .named("ReadWeatherStations") .from("clouddataflow-readonly:samples.weather_stations")); PCollection weatherData = p.apply(BigQueryIO.Read .named(“ ReadWeatherStations”).from(“ clouddataflow-readonly:samples.weather_stations”));

Reading from BigQuery 从BigQuery读取

Writting to CSV - using - TextIO.Write 写入CSV-使用-TextIO.Write

To output data to text files, apply TextIO.Write to the PCollection that you want to output. 要将数据输出到文本文件,请将TextIO.Write应用于要输出的PCollection。 Keep the following things in mind when using TextIO.Write: 使用TextIO.Write时,请记住以下几点:

You may only apply TextIO.Write to a PCollection. 您只能将TextIO.Write应用于PCollection。 You may need to use a simple ParDo to format your data from an intermediate PCollection to a PCollection prior to writing with TextIO.Write. 在使用TextIO.Write进行写入之前,可能需要使用简单的ParDo将数据从中间PCollection格式化为PCollection。 Each element in the output PCollection will represent one line in the resulting text file. 输出PCollection中的每个元素将代表结果文本文件中的一行。 Dataflow's file-based write operations, like TextIO.Write, write to multiple output files by default. 默认情况下,Dataflow基于文件的写入操作(如TextIO.Write)会写入多个输出文件。 See Writing Output Data for more information. 有关更多信息,请参见写入输出数据。

PCollection filteredWords = ...; PCollectionfilteredWords = ...; filteredWords.apply(TextIO.Write.named("WriteMyFile") .to("gs://some/outputData")); filterWords.apply(TextIO.Write.named(“ WriteMyFile”).to(“ gs:// some / outputData”)));

Writing to Text Files 写入文本文件

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM