简体   繁体   中英

How to deserialize JSON Array to Apache beam PCollection<javaObject>

I have data like

[{"ProjectId":1476401625,"ProjectName":"This is project name","ProjectPostcode":4178},{"ProjectId":2343,"ProjectName":"This is project 2 name","ProjectPostcode":5323}]

I need to to deserialize it to Java object and I use this code:

PCollection<Project> deserialisedProjectObject = projectFile.apply("Deserialize Projects", ParseJsons.of(Project.class))
        .setCoder(SerializableCoder.of(Project.class));

but I always got error

Exception in thread "main" org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.RuntimeException: Failed to parse a com.lendlease.dp.entity.Project from JSON value: [{"ProjectId":1476401625,"ProjectName":"This is project name","ProjectPostcode":4178},{"ProjectId":2343,"ProjectName":"This is project 2 name","ProjectPostcode":5323}]

If I change the code to become:

PCollection<Project[]> deserialisedProjectObject = projectFile.apply("Deserialize Projects", ParseJsons.of(Project[].class))
        .setCoder(SerializableCoder.of(Project[].class));

The runner able to deserialize it but I need this line to return a collection of Project; not collection of Project array

You are starting with a Project[] object, so the parse is correct. To extract the Project objects from that object, just apply a FlatMap transform after the ParseJson, outputting the elements within the Array.

As well as ParseJson you may want to look at:

JsonToRow

The output of this is a Row object which you can use as a schema which provide a lot of nice functionality, see using schemas . If you need a an actual POJO within the pipeline as well as the Row object you can make use of Convert.fromRow to turn it into a Pojo object.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM