简体   繁体   中英

How to select a set of fields from input data as an array of repeated fields in beam SQL

Problem Statement:

I have an input PCollection with following fields:

{
   firstname_1,
   lastname_1,
   dob,
   firstname_2,
   lastname_2, 
   firstname_3,
   lastname_3,
}

then I execute a Beam SQL operation such that output of resultant PCollection should be like

 ----------------------------------------------
   name.firstname |  name.lastname | dob
 ---------------------------------------------- 
      firstname_1 |  lastname_1    | 202009
      firstname_2 |  lastname_2    | 
      firstname_3 |  lastname_3    |
-----------------------------------------------

To be precise:

array[
    (firstname_1,lastname_1,dob),
    (firstname_2,lastname_2,dob),
    (firstname_3,lastname_3,dob)
]

Here is the code snippet where I execute Beam SQL:

PCollectionTuple tuple=
    PCollectionTuple.of(new TupleTag<>("testPcollection"), testPcollection);

PCollection<Row> result = tuple
    .apply(SqlTransform.query(
        "SELECT array[(firstname_1,lastname_1,dob), (firstname_2,lastname_2,dob), (firstname_3,lastname_3,dob)]"));

I am not getting proper results.

Can someone guide me how to query an array of repeated field in Beam SQL?

Your SQL query has a few errors.

  1. You have named the input to the SQL query testPcollection . Your SQL query does not select FROM testPcollection . Let us assume you meant it to be FROM testPcollection .
  2. You use the syntax (firstname_1, lastname_1, doc) in both your expected output and your query. This is not any valid SQL expression.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM