简体   繁体   中英

Dataflow - Windowed writes to BigQuery?

Dataflow - Is there windowed writes to BigQuery? I am trying to run a Dataflow job which reads 500Million rows of files, and then write to BigQuery. When i ran, it didn't go more than 15 Million, so looking if any kind of Windowing writes to BigQuery would help. While running, i got many GC Allocation failures, but I see that those are normal. I have left the default diskSize configured when running. Please help. If there is any examples for windowed writes to BigQuery, please provide.

As for the transformation, it is just a split of the the string, and then insert into BigQuery.

Also, is the example below keeps writing to BigQuery as it keeps streaming from PubSub? https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/master/src/main/java/com/google/cloud/teleport/templates/PubSubToBigQuery.java

My sample below

Pipeline pipeline = Pipeline.create(options);
        PCollection<String> textData = pipeline.apply("Read Text Data",
                TextIO.read().from(options.getInputFilePattern()));
        PCollection<TableRow> tr = textData.apply(ParDo.of(new FormatRemindersFn()));

        tr.apply(BigQueryIO.writeTableRows().withoutValidation()              .withCustomGcsTempLocation(options.getBigQueryLoadingTemporaryDirectory())
                .withSchema(FormatRemindersFn.getSchema())
                //  .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED)
                .withWriteDisposition(WriteDisposition.WRITE_APPEND)
                .to(options.getSchemaDetails()));

 static class FormatRemindersFn extends DoFn<String, TableRow> {
  @ProcessElement
        public void processElement(ProcessContext c) {
            try {
                if (StringUtils.isNotEmpty(c.element())) {
                    String[] fields = c.element().split("\\^",15);

                  //  logger.info("Fields :{}", fields[2]);
                    TableRow row = new TableRow().set("MODIFIED_DATE", fields[0])
                            .set("NAME", fields[1])
                            .set("ADDRESS", fields[2]);

                    c.output(row);
                }
            } catch (Exception e) {
                logger.error("Error: {}", e.getMessage());
            }
        }
}

The error got resolved, after commenting the logging done as part of DoFn for each element. Logging for every element should not be done when processing that many elements.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM