I am using Dataflow 1.9 (JAVA API) to read a Pubsub message and seamlessly stream that into BigQuery without explicitly setting each column in a TableRow
. Below is the code snippet for the conversion.
PCollection<TableRow> payloadTableRow = pipeline
.apply("Read",PubsubIO.Read.subscription(***MY_SUBSCRIPTION***)
.withCoder(TableRowJsonCoder.of()));`
The above code works perfectly and I can see the Pubsub message in a topic gets converted to PCollection<TableRow>
and then into BigQuery using BigQueryIO.Write
.
When I try to emulate the same in Apache Beam, I couldn't set the TableRowJsonCoder
for a PubSub message as Beam's PubSubIO
lacks the method withCoder()
. In Beam, I tried the setCoder()
as below but getting compilation error. I even tried PubsubIO.readStrings
but the error stays same.
pipeline
.apply("Read",PubsubIO.readMessagesWithAttributes()
.fromSubscription(***MY_SUBSCRIPTION***))
.setCoder(TableRowJsonCoder.of())`
I am seeing the withCoder()
exists in Dataflow 1.9 but the missing feature impedes me to upgrade to Beam.
My questions are:
PubSubIO
class have anything similar to withCoder()
so that I can move to Beam? PubSubIO
for this implicit conversion of TableRowJsonCoder.of()
. As Kenn Knowles rightly pointed out, I have used MapElements
and pulled out the byte[]
and then transformed that to TableRow
as below.
PCollection<byte[]> payloadByteArray = payladInPubSubMessage.apply(
MapElements.via(new SimpleFunction<PubsubMessage, byte[]>() {
@Override
public byte[] apply(PubsubMessage input) {
return input.getPayload();
}
}));
PCollection<TableRow> payladTableRow = payloadByteArray.apply(
MapElements.via(new SimpleFunction<byte[], TableRow>() {
@Override
public TableRow apply(byte[] input) {
TableRow tableRow = null;
try{
tableRow = TableRowJsonCoder.of().decode(new ByteArrayInputStream(input));
}
catch (Exception ex){
ex.printStackTrace();
}
return tableRow;
}
}));
Now I am encountering EOFException
while transforming the byte array to TableRow using TableRowJsonCoder.of().decode()
. I sensed I am missing some sort of Coder
for TableRow
and registered a coder as below.
CoderRegistry registry = pipeline.getCoderRegistry();
registry.registerCoderForClass(TableRow.class,TableRowJsonCoder.of());
This doesn't seem to solve the issue and I would like to get some insight on the error below:
Caused by: org.apache.beam.sdk.coders.CoderException: java.io.EOFException
at org.apache.beam.sdk.coders.StringUtf8Coder.decode(StringUtf8Coder.java:110)
at org.apache.beam.sdk.io.gcp.bigquery.TableRowJsonCoder.decode(TableRowJsonCoder.java:61)
at org.apache.beam.sdk.io.gcp.bigquery.TableRowJsonCoder.decode(TableRowJsonCoder.java:55)
at com.gcp.poc.transformers.TableRowTransformer.processElement(TableRowTransformer.java:48)
Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at org.apache.beam.sdk.coders.StringUtf8Coder.readString(StringUtf8Coder.java:63)
at org.apache.beam.sdk.coders.StringUtf8Coder.decode(StringUtf8Coder.java:106)
at org.apache.beam.sdk.io.gcp.bigquery.TableRowJsonCoder.decode(TableRowJsonCoder.java:61)
at org.apache.beam.sdk.io.gcp.bigquery.TableRowJsonCoder.decode(TableRowJsonCoder.java:55) .
I hope I make sense and would love to get a solution for this decoding issue for TableRow.
In Beam, IO connectors are simplified to output their most natural type. For PubsubIO
it is PubsubMessage
. From there, you can perform arbitrary processing on the messages.
For your specific example, you would use PubsubIO.readMessages()
followed by MapElements
to pull out the byte[]
payload and parse it to a TableRow
.
The TableRowJsonCoder
describse how to encode/decode elements of type TableRow
when passing them between points in a pipeline. Instead of calling TableRowJsonCoder.of().decode(...)
within your MapElements
, you should actually examine the bytes that you have received from PubSub, and parse them into some meaningful form. This could be creating a TableRow
using the various methods for doing so, as shown in the various Beam examples, such as BigQueryTornadoes .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.