简体   繁体   中英

Big Query Multiple Independent Arrays of Structures

I am trying to create two arrays of structures that are independent of each other.

I am trying to get these arrays created from another table that is an import of a flat file with different record types depending on the data (Rec0015 - Earnings and Rec0025 - Deductions). Each record type has a slightly different layout, so I import the record data as a string that is "~" delimited. I have simplified my examples below to only show the basic data needed to illustrate my issues.

If I run the queries independently of each other each array is created properly; however, when I combine both I cannot get it to work.

For the first query:

SELECT 
     flat.EmployeeNumber,
     ARRAY_AGG(STRUCT(
          flat.RecordNumberEarn,
          flat.EmployeeEarningsAmount
     ))
     as EmployeeEarningsDetail
FROM 
(SELECT DISTINCT
     stage.EmployeeNumber,
     Rec0015.RecordNumberEarn,
     Rec0015.EmployeeEarningsAmount,
FROM  bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging stage
INNER JOIN 
(SELECT EmployeeNumber,
 SPLIT(VariableData, '~')[SAFE_OFFSET(0)] AS RecordNumberEarn,
 SPLIT(VariableData, '~')[SAFE_OFFSET(4)] AS EmployeeEarningsAmount
 FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging
 WHERE RecordTypeNumber = '0015'
) Rec0015
ON stage.EmployeeNumber = Rec0015.EmployeeNumber  
) as flat 
GROUP BY
     flat.EmployeeNumber

I get the following result (which is correct):

EmployeeNumber  EmployeeEarningsDetail
xxxx521     "{
  ""EmployeeEarningsDetail"": [{
    ""RecordNumberEarn"": ""0001"",
    ""EmployeeEarningsAmount"": ""      375.52""
  }, {
    ""RecordNumberEarn"": ""0002"",
    ""EmployeeEarningsAmount"": ""      387.26""
  }]
}" ....

When I run the "full" query:

SELECT 
     flat2.EmployeeNumber,
     ARRAY_AGG(STRUCT(
          Rec0025.RecordNumberDed,
          Rec0025.EmployeePayDeductionAmount
     ))
     as EmployeeDeductionsDetail
FROM 

(SELECT 
     flat.EmployeeNumber,
     ARRAY_AGG(STRUCT(
          flat.RecordNumberEarn,
          flat.EmployeeEarningsAmount
     ))
     as EmployeeEarningsDetail
FROM 
(SELECT DISTINCT
     stage.EmployeeNumber,
     Rec0015.RecordNumberEarn,
     Rec0015.EmployeeEarningsAmount,
FROM  bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging stage
INNER JOIN 
(SELECT EmployeeNumber,
 SPLIT(VariableData, '~')[SAFE_OFFSET(0)] AS RecordNumberEarn,
 SPLIT(VariableData, '~')[SAFE_OFFSET(4)] AS EmployeeEarningsAmount
 FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging
 WHERE RecordTypeNumber = '0015'
) Rec0015
ON stage.EmployeeNumber = Rec0015.EmployeeNumber  
) as flat 
GROUP BY
     flat.EmployeeNumber
) as flat2
INNER JOIN 
(SELECT DISTINCT EmployeeNumber,
 SPLIT(VariableData, '~')[SAFE_OFFSET(0)] AS RecordNumberDed,
 SPLIT(VariableData, '~')[SAFE_OFFSET(5)] AS EmployeePayDeductionAmount
 FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging
 WHERE RecordTypeNumber = '0025'
) Rec0025 
ON flat2.EmployeeNumber = Rec0025.EmployeeNumber
GROUP BY
     flat2.EmployeeNumber

I get the following result (which is correct, but doesn't include the first array structure):

EmployeeNumber  EmployeeDeductionsDetail
xxxx521     "{
  ""EmployeeDeductionsDetail"": [{
    ""RecordNumberDed"": ""0001"",
    ""EmployeePayDeductionAmount"": ""       50.65""
  }, {
    ""RecordNumberDed"": ""0002"",
    ""EmployeePayDeductionAmount"": ""       44.15""
  }, {
    ""RecordNumberDed"": ""0003"",
    ""EmployeePayDeductionAmount"": ""       44.15""
  }, {
    ""RecordNumberDed"": ""0004"",
    ""EmployeePayDeductionAmount"": ""       10.33""
  }, {
    ""RecordNumberDed"": ""0005"",
    ""EmployeePayDeductionAmount"": ""       10.33""
  }, {
    ""RecordNumberDed"": ""0006"",
    ""EmployeePayDeductionAmount"": ""       61.54""
  }, {
    ""RecordNumberDed"": ""0007"",
    ""EmployeePayDeductionAmount"": ""       13.22""
  }, {
    ""RecordNumberDed"": ""0008"",
    ""EmployeePayDeductionAmount"": ""        7.84""
  }, {
    ""RecordNumberDed"": ""0009"",
    ""EmployeePayDeductionAmount"": ""        0.69""
  }, {
    ""RecordNumberDed"": ""0010"",
    ""EmployeePayDeductionAmount"": ""        5.00""
  }]
}" ...

However, in the 2nd query, I really want to do a group by on the array structure EmployeeEarningsDetail, but when I add the array to the select and group by I get the error:

"Grouping by expressions of type ARRAY is not allowed."

I tried adding a TO_JSON_STRING(EmployeeEarningsDetail) in both the Select and the Group by, but I got a column of just the string not as an array as below:

SELECT 
     flat2.EmployeeNumber,
     TO_JSON_STRING(flat2.EmployeeEarningsDetail),
     ARRAY_AGG(STRUCT(
          Rec0025.RecordNumberDed,
          Rec0025.EmployeePayDeductionAmount
     ))
     as EmployeeDeductionsDetail
FROM 

(SELECT 
     flat.EmployeeNumber,
     ARRAY_AGG(STRUCT(
          flat.RecordNumberEarn,
          flat.EmployeeEarningsAmount
     ))
     as EmployeeEarningsDetail
FROM 
(SELECT DISTINCT
     stage.EmployeeNumber,
     Rec0015.RecordNumberEarn,
     Rec0015.EmployeeEarningsAmount,
FROM  bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging stage
INNER JOIN 
(SELECT EmployeeNumber,
 SPLIT(VariableData, '~')[SAFE_OFFSET(0)] AS RecordNumberEarn,
 SPLIT(VariableData, '~')[SAFE_OFFSET(4)] AS EmployeeEarningsAmount
 FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging
 WHERE RecordTypeNumber = '0015'
) Rec0015
ON stage.EmployeeNumber = Rec0015.EmployeeNumber  
) as flat 
GROUP BY
     flat.EmployeeNumber
) as flat2
INNER JOIN 
(SELECT DISTINCT EmployeeNumber,
 SPLIT(VariableData, '~')[SAFE_OFFSET(0)] AS RecordNumberDed,
 SPLIT(VariableData, '~')[SAFE_OFFSET(5)] AS EmployeePayDeductionAmount
 FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging
 WHERE RecordTypeNumber = '0025'
) Rec0025 
ON flat2.EmployeeNumber = Rec0025.EmployeeNumber
GROUP BY
     flat2.EmployeeNumber,
     TO_JSON_STRING(flat2.EmployeeEarningsDetail)

The results were (not correct at all) are below the JSON shows an f0_ string of all the earnings and the first Deduction followed by another row with the rest of the deductions in an array See below:

EmployeeNumber  f0_ EmployeeDeductionsDetail
xxxx521     "[{""RecordNumberEarn"":""0001"",""EmployeeEarningsAmount"":""      375.52""},{""RecordNumberEarn"":""0002"",""EmployeeEarningsAmount"":""      387.26""}]" "{
  ""EmployeeDeductionsDetail"": [{
    ""RecordNumberDed"": ""0001"",
    ""EmployeePayDeductionAmount"": ""       50.65""
  }, {
    ""RecordNumberDed"": ""0002"",
    ""EmployeePayDeductionAmount"": ""       44.15""
  }, {
    ""RecordNumberDed"": ""0003"",
    ""EmployeePayDeductionAmount"": ""       44.15""
  }, {
    ""RecordNumberDed"": ""0004"",
    ""EmployeePayDeductionAmount"": ""       10.33""
  }, {
    ""RecordNumberDed"": ""0005"",
    ""EmployeePayDeductionAmount"": ""       10.33""
  }, {
    ""RecordNumberDed"": ""0006"",
    ""EmployeePayDeductionAmount"": ""       61.54""
  }, {
    ""RecordNumberDed"": ""0007"",
    ""EmployeePayDeductionAmount"": ""       13.22""
  }, {
    ""RecordNumberDed"": ""0008"",
    ""EmployeePayDeductionAmount"": ""        7.84""
  }, {
    ""RecordNumberDed"": ""0009"",
    ""EmployeePayDeductionAmount"": ""        0.69""
  }, {
    ""RecordNumberDed"": ""0010"",
    ""EmployeePayDeductionAmount"": ""        5.00""
  }]
}" ...

I have tried to put both ARRAY_AGGs in the same SELECT my arrays are cartesian products:

The SQL is:

SELECT 
     flat.EmployeeNumber,
     ARRAY_AGG(STRUCT(
          flat.RecordNumberEarn,
          flat.EmployeeEarningsAmount
     ))
     as EmployeeEarningsDetail,
     ARRAY_AGG(STRUCT(
          flat.RecordNumberDed,
          flat.EmployeePayDeductionAmount
     ))
     as EmployeeDeductionsDetail
FROM 
(SELECT DISTINCT
     stage.EmployeeNumber,
     Rec0015.RecordNumberEarn,
     Rec0015.EmployeeEarningsAmount,
     Rec0025.RecordNumberDed,
     Rec0025.EmployeePayDeductionAmount
FROM  bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging stage
INNER JOIN 
(SELECT EmployeeNumber,
 SPLIT(VariableData, '~')[SAFE_OFFSET(0)] AS RecordNumberEarn,
 SPLIT(VariableData, '~')[SAFE_OFFSET(4)] AS EmployeeEarningsAmount
 FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging
 WHERE RecordTypeNumber = '0015'
) Rec0015
ON stage.EmployeeNumber = Rec0015.EmployeeNumber  
INNER JOIN 
(SELECT DISTINCT EmployeeNumber,
 SPLIT(VariableData, '~')[SAFE_OFFSET(0)] AS RecordNumberDed,
 SPLIT(VariableData, '~')[SAFE_OFFSET(5)] AS EmployeePayDeductionAmount
 FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging
 WHERE RecordTypeNumber = '0025'
) Rec0025 
ON stage.EmployeeNumber = Rec0025.EmployeeNumber
) as flat
GROUP BY
     flat.EmployeeNumber

and the results are:

EmployeeNumber  EmployeeEarningsDetail  EmployeeDeductionsDetail
xxxx521     "{
  ""EmployeeEarningsDetail"": [{
    ""RecordNumberEarn"": ""0001"",
    ""EmployeeEarningsAmount"": ""      375.52""
  }, {
    ""RecordNumberEarn"": ""0001"",
    ""EmployeeEarningsAmount"": ""      375.52""
  }, {
    ""RecordNumberEarn"": ""0001"",
    ""EmployeeEarningsAmount"": ""      375.52""
  }, {
    ""RecordNumberEarn"": ""0001"",
    ""EmployeeEarningsAmount"": ""      375.52""
  }, {
    ""RecordNumberEarn"": ""0001"",
    ""EmployeeEarningsAmount"": ""      375.52""
  }, {
    ""RecordNumberEarn"": ""0001"",
    ""EmployeeEarningsAmount"": ""      375.52""
  }, {
    ""RecordNumberEarn"": ""0001"",
    ""EmployeeEarningsAmount"": ""      375.52""
  }, {
    ""RecordNumberEarn"": ""0001"",
    ""EmployeeEarningsAmount"": ""      375.52""
  }, {
    ""RecordNumberEarn"": ""0001"",
    ""EmployeeEarningsAmount"": ""      375.52""
  }, {
    ""RecordNumberEarn"": ""0001"",
    ""EmployeeEarningsAmount"": ""      375.52""
  }, {
    ""RecordNumberEarn"": ""0002"",
    ""EmployeeEarningsAmount"": ""      387.26""
  }, {
    ""RecordNumberEarn"": ""0002"",
    ""EmployeeEarningsAmount"": ""      387.26""
  }, {
    ""RecordNumberEarn"": ""0002"",
    ""EmployeeEarningsAmount"": ""      387.26""
  }, {
    ""RecordNumberEarn"": ""0002"",
    ""EmployeeEarningsAmount"": ""      387.26""
  }, {
    ""RecordNumberEarn"": ""0002"",
    ""EmployeeEarningsAmount"": ""      387.26""
  }, {
    ""RecordNumberEarn"": ""0002"",
    ""EmployeeEarningsAmount"": ""      387.26""
  }, {
    ""RecordNumberEarn"": ""0002"",
    ""EmployeeEarningsAmount"": ""      387.26""
  }, {
    ""RecordNumberEarn"": ""0002"",
    ""EmployeeEarningsAmount"": ""      387.26""
  }, {
    ""RecordNumberEarn"": ""0002"",
    ""EmployeeEarningsAmount"": ""      387.26""
  }, {
    ""RecordNumberEarn"": ""0002"",
    ""EmployeeEarningsAmount"": ""      387.26""
  }]
}"  "{
  ""EmployeeDeductionsDetail"": [{
    ""RecordNumberDed"": ""0001"",
    ""EmployeePayDeductionAmount"": ""       50.65""
  }, {
    ""RecordNumberDed"": ""0002"",
    ""EmployeePayDeductionAmount"": ""       44.15""
  }, {
    ""RecordNumberDed"": ""0003"",
    ""EmployeePayDeductionAmount"": ""       44.15""
  }, {
    ""RecordNumberDed"": ""0004"",
    ""EmployeePayDeductionAmount"": ""       10.33""
  }, {
    ""RecordNumberDed"": ""0005"",
    ""EmployeePayDeductionAmount"": ""       10.33""
  }, {
    ""RecordNumberDed"": ""0006"",
    ""EmployeePayDeductionAmount"": ""       61.54""
  }, {
    ""RecordNumberDed"": ""0007"",
    ""EmployeePayDeductionAmount"": ""       13.22""
  }, {
    ""RecordNumberDed"": ""0008"",
    ""EmployeePayDeductionAmount"": ""        7.84""
  }, {
    ""RecordNumberDed"": ""0009"",
    ""EmployeePayDeductionAmount"": ""        0.69""
  }, {
    ""RecordNumberDed"": ""0010"",
    ""EmployeePayDeductionAmount"": ""        5.00""
  }, {
    ""RecordNumberDed"": ""0001"",
    ""EmployeePayDeductionAmount"": ""       50.65""
  }, {
    ""RecordNumberDed"": ""0002"",
    ""EmployeePayDeductionAmount"": ""       44.15""
  }, {
    ""RecordNumberDed"": ""0003"",
    ""EmployeePayDeductionAmount"": ""       44.15""
  }, {
    ""RecordNumberDed"": ""0004"",
    ""EmployeePayDeductionAmount"": ""       10.33""
  }, {
    ""RecordNumberDed"": ""0005"",
    ""EmployeePayDeductionAmount"": ""       10.33""
  }, {
    ""RecordNumberDed"": ""0006"",
    ""EmployeePayDeductionAmount"": ""       61.54""
  }, {
    ""RecordNumberDed"": ""0007"",
    ""EmployeePayDeductionAmount"": ""       13.22""
  }, {
    ""RecordNumberDed"": ""0008"",
    ""EmployeePayDeductionAmount"": ""        7.84""
  }, {
    ""RecordNumberDed"": ""0009"",
    ""EmployeePayDeductionAmount"": ""        0.69""
  }, {
    ""RecordNumberDed"": ""0010"",
    ""EmployeePayDeductionAmount"": ""        5.00""
  }]
}" ...

Any suggestions on how I can "fix" this.

Thanks, David

A workaround for GROUP BY when the type does not support aggregation, eg STRUCT or GEOGRAPHY is to turn it to a STRING key. TO_JSON_STRING can do it for STRUCT, ST_AsText can do it for GEOGRAPHY, etc. Note this might not be very performant.

To get the original value, without any modification, use ANY_VALUE aggregation function - it picks arbitrary value from its inputs, here all the values are presumably the same, so we don't care which one.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM