简体   繁体   中英

Apache Beam stream processing of json data

I am analyzing Apache Beam stream processing of data. I have worked on Apache Kafka stream processing (Producer, Consumer etc). I want to compare it with Beam now.

I want to to stream simple json data using Apache Beam programmatically (Java).

{"UserID":"1","Address":"XXX","ClassNo":"989","UserName":"Stella","ClassType":"YYY"}

Can someone please guide me or direct me with an example link?

There are multiple aspects of this:

  • first you need to establish where the data is coming from:
    • you need to use some kind of IO in Beam pipeline, see here ;
    • there are a bunch of built in IOs, see the list here ;
    • by using an IO from the above link you will likely get a stream of strings containing those JSON objects;
    • some IOs can natively parse Avro and other formats (PubsubIO), this depends on specific IO implementation;
  • then you may need to transform the data:

    • you will need to create your own PTransform which handles the conversion from a JSON string to your Java class:
      • see the section about PTransforms here ;
    • you can see an example of such transform here :
      • this JsonToRow PTransform accepts a string with JSON object and converts it to a Beam Row using Jackson ObjectMapper;
      • you can either try using the Row object yourself, or you can implement a similar transform to convert JSON strings to your custom Java type instead of Row;
  • you may also take a look at examples folder in Beam source;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM