簡體   English   中英

Apache 從 GCS 讀取 Avro 文件並寫入 BigQuery

[英]Apache beam reading Avro files from GCS and writing to BigQuery

運行 java 作業以讀取 Avro 文件並出現錯誤。 尋求幫助 -

這是代碼 -

// Get Avro Schema
String schemaJson = getSchema(options.getAvroSchema());
Schema schema = new Schema.Parser().parse(schemaJson);

// Check schema field types before starting the Dataflow job
checkFieldTypes(schema);

// Create the Pipeline object with the options we defined above.
Pipeline pipeline = Pipeline.create(options);
String bqStr = getBQString(options);
// TableSchema ts = BigQueryAvroUtils.getTableSchema(User.SCHEMA$);
// Convert Avro To CSV
PCollection<GenericRecord> records =
    pipeline.apply(
        "Read Avro files",
        AvroIO.readGenericRecords(schema)
            .from(options.getInputFile()));

records
    .apply(
        "Convert Avro to CSV formatted data",
        ParDo.of(new ConvertAvroToCsv(schemaJson, options.getCsvDelimiter())))
    .apply(
        "Write CSV formatted data",
        TextIO.write().to(options.getOutput())
            .withSuffix(".csv"));

records.apply(
      "Write to BigQuery",
      BigQueryIO.write()
          .to(bqStr)
          .withJsonSchema(schemaJson)
          .withWriteDisposition(WRITE_APPEND)
          .withCreateDisposition(CREATE_IF_NEEDED)
          .withFormatFunction(TABLE_ROW_PARSER));
  // [END bq_write]

這是我看到的錯誤 -

2020-06-01 13:14:41 ERROR MonitoringUtil$LoggingHandler:99 - 2020-06-01T07:44:39.240Z: java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to org.apache.avro.specific.SpecificRecord
        at com.example.AvroToCsv$1.apply(AvroToCsv.java:1)
        at org.apache.beam.sdk.io.gcp.bigquery.PrepareWrite$1.processElement(PrepareWrite.java:76)

2020-06-01 13:14:52 ERROR MonitoringUtil$LoggingHandler:99 - 2020-06-01T07:44:48.956Z: java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to org.apache.avro.specific.SpecificRecord
        at com.example.AvroToCsv$1.apply(AvroToCsv.java:1)
        at org.apache.beam.sdk.io.gcp.bigquery.PrepareWrite$1.processElement(PrepareWrite.java:76)

2020-06-01 13:15:03 ERROR MonitoringUtil$LoggingHandler:99 - 2020-06-01T07:44:58.811Z: java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to org.apache.avro.specific.SpecificRecord
        at com.example.AvroToCsv$1.apply(AvroToCsv.java:1)
        at org.apache.beam.sdk.io.gcp.bigquery.PrepareWrite$1.processElement(PrepareWrite.java:76)

2020-06-01 13:15:15 ERROR MonitoringUtil$LoggingHandler:99 - 2020-06-01T07:45:10.673Z: java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to org.apache.avro.specific.SpecificRecord
        at com.example.AvroToCsv$1.apply(AvroToCsv.java:1)
        at org.apache.beam.sdk.io.gcp.bigquery.PrepareWrite$1.processElement(PrepareWrite.java:76)

錯誤在您的TABLE_ROW_PARSER function 中。 它似乎將 Avro GenericRecord轉換為SpecificRecord

PrepareWrite 中失敗的行在這里 該行調用您提供的格式 function。 格式 function 必須將每個輸入元素轉換為 JSON TableRow 為了提高效率,使用withAvroFormatFunction可能會更好。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM