简体   繁体   English

大查询多个独立的Arrays结构体

[英]Big Query Multiple Independent Arrays of Structures

I am trying to create two arrays of structures that are independent of each other.我正在尝试创建两个相互独立的结构 arrays。

I am trying to get these arrays created from another table that is an import of a flat file with different record types depending on the data (Rec0015 - Earnings and Rec0025 - Deductions).我正在尝试从另一个表中创建这些 arrays,该表是导入具有不同记录类型的平面文件,具体取决于数据(Rec0015 - 收入和 Rec0025 - 扣除)。 Each record type has a slightly different layout, so I import the record data as a string that is "~" delimited.每种记录类型的布局略有不同,因此我将记录数据导入为以“~”分隔的字符串。 I have simplified my examples below to only show the basic data needed to illustrate my issues.我在下面简化了我的示例,仅显示说明我的问题所需的基本数据。

If I run the queries independently of each other each array is created properly;如果我彼此独立运行查询,则每个数组都会正确创建; however, when I combine both I cannot get it to work.但是,当我将两者结合起来时,我无法让它工作。

For the first query:对于第一个查询:

SELECT 
     flat.EmployeeNumber,
     ARRAY_AGG(STRUCT(
          flat.RecordNumberEarn,
          flat.EmployeeEarningsAmount
     ))
     as EmployeeEarningsDetail
FROM 
(SELECT DISTINCT
     stage.EmployeeNumber,
     Rec0015.RecordNumberEarn,
     Rec0015.EmployeeEarningsAmount,
FROM  bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging stage
INNER JOIN 
(SELECT EmployeeNumber,
 SPLIT(VariableData, '~')[SAFE_OFFSET(0)] AS RecordNumberEarn,
 SPLIT(VariableData, '~')[SAFE_OFFSET(4)] AS EmployeeEarningsAmount
 FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging
 WHERE RecordTypeNumber = '0015'
) Rec0015
ON stage.EmployeeNumber = Rec0015.EmployeeNumber  
) as flat 
GROUP BY
     flat.EmployeeNumber

I get the following result (which is correct):我得到以下结果(这是正确的):

EmployeeNumber  EmployeeEarningsDetail
xxxx521     "{
  ""EmployeeEarningsDetail"": [{
    ""RecordNumberEarn"": ""0001"",
    ""EmployeeEarningsAmount"": ""      375.52""
  }, {
    ""RecordNumberEarn"": ""0002"",
    ""EmployeeEarningsAmount"": ""      387.26""
  }]
}" ....

When I run the "full" query:当我运行“完整”查询时:

SELECT 
     flat2.EmployeeNumber,
     ARRAY_AGG(STRUCT(
          Rec0025.RecordNumberDed,
          Rec0025.EmployeePayDeductionAmount
     ))
     as EmployeeDeductionsDetail
FROM 

(SELECT 
     flat.EmployeeNumber,
     ARRAY_AGG(STRUCT(
          flat.RecordNumberEarn,
          flat.EmployeeEarningsAmount
     ))
     as EmployeeEarningsDetail
FROM 
(SELECT DISTINCT
     stage.EmployeeNumber,
     Rec0015.RecordNumberEarn,
     Rec0015.EmployeeEarningsAmount,
FROM  bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging stage
INNER JOIN 
(SELECT EmployeeNumber,
 SPLIT(VariableData, '~')[SAFE_OFFSET(0)] AS RecordNumberEarn,
 SPLIT(VariableData, '~')[SAFE_OFFSET(4)] AS EmployeeEarningsAmount
 FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging
 WHERE RecordTypeNumber = '0015'
) Rec0015
ON stage.EmployeeNumber = Rec0015.EmployeeNumber  
) as flat 
GROUP BY
     flat.EmployeeNumber
) as flat2
INNER JOIN 
(SELECT DISTINCT EmployeeNumber,
 SPLIT(VariableData, '~')[SAFE_OFFSET(0)] AS RecordNumberDed,
 SPLIT(VariableData, '~')[SAFE_OFFSET(5)] AS EmployeePayDeductionAmount
 FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging
 WHERE RecordTypeNumber = '0025'
) Rec0025 
ON flat2.EmployeeNumber = Rec0025.EmployeeNumber
GROUP BY
     flat2.EmployeeNumber

I get the following result (which is correct, but doesn't include the first array structure):我得到以下结果(这是正确的,但不包括第一个数组结构):

EmployeeNumber  EmployeeDeductionsDetail
xxxx521     "{
  ""EmployeeDeductionsDetail"": [{
    ""RecordNumberDed"": ""0001"",
    ""EmployeePayDeductionAmount"": ""       50.65""
  }, {
    ""RecordNumberDed"": ""0002"",
    ""EmployeePayDeductionAmount"": ""       44.15""
  }, {
    ""RecordNumberDed"": ""0003"",
    ""EmployeePayDeductionAmount"": ""       44.15""
  }, {
    ""RecordNumberDed"": ""0004"",
    ""EmployeePayDeductionAmount"": ""       10.33""
  }, {
    ""RecordNumberDed"": ""0005"",
    ""EmployeePayDeductionAmount"": ""       10.33""
  }, {
    ""RecordNumberDed"": ""0006"",
    ""EmployeePayDeductionAmount"": ""       61.54""
  }, {
    ""RecordNumberDed"": ""0007"",
    ""EmployeePayDeductionAmount"": ""       13.22""
  }, {
    ""RecordNumberDed"": ""0008"",
    ""EmployeePayDeductionAmount"": ""        7.84""
  }, {
    ""RecordNumberDed"": ""0009"",
    ""EmployeePayDeductionAmount"": ""        0.69""
  }, {
    ""RecordNumberDed"": ""0010"",
    ""EmployeePayDeductionAmount"": ""        5.00""
  }]
}" ...

However, in the 2nd query, I really want to do a group by on the array structure EmployeeEarningsDetail, but when I add the array to the select and group by I get the error:但是,在第二个查询中,我真的想对数组结构 EmployeeEarningsDetail 进行分组,但是当我将数组添加到 select 和分组时,我得到了错误:

"Grouping by expressions of type ARRAY is not allowed." “不允许按 ARRAY 类型的表达式分组。”

I tried adding a TO_JSON_STRING(EmployeeEarningsDetail) in both the Select and the Group by, but I got a column of just the string not as an array as below:我尝试在 Select 和 Group by 中添加 TO_JSON_STRING(EmployeeEarningsDetail) ,但我得到的只是字符串列而不是数组,如下所示:

SELECT 
     flat2.EmployeeNumber,
     TO_JSON_STRING(flat2.EmployeeEarningsDetail),
     ARRAY_AGG(STRUCT(
          Rec0025.RecordNumberDed,
          Rec0025.EmployeePayDeductionAmount
     ))
     as EmployeeDeductionsDetail
FROM 

(SELECT 
     flat.EmployeeNumber,
     ARRAY_AGG(STRUCT(
          flat.RecordNumberEarn,
          flat.EmployeeEarningsAmount
     ))
     as EmployeeEarningsDetail
FROM 
(SELECT DISTINCT
     stage.EmployeeNumber,
     Rec0015.RecordNumberEarn,
     Rec0015.EmployeeEarningsAmount,
FROM  bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging stage
INNER JOIN 
(SELECT EmployeeNumber,
 SPLIT(VariableData, '~')[SAFE_OFFSET(0)] AS RecordNumberEarn,
 SPLIT(VariableData, '~')[SAFE_OFFSET(4)] AS EmployeeEarningsAmount
 FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging
 WHERE RecordTypeNumber = '0015'
) Rec0015
ON stage.EmployeeNumber = Rec0015.EmployeeNumber  
) as flat 
GROUP BY
     flat.EmployeeNumber
) as flat2
INNER JOIN 
(SELECT DISTINCT EmployeeNumber,
 SPLIT(VariableData, '~')[SAFE_OFFSET(0)] AS RecordNumberDed,
 SPLIT(VariableData, '~')[SAFE_OFFSET(5)] AS EmployeePayDeductionAmount
 FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging
 WHERE RecordTypeNumber = '0025'
) Rec0025 
ON flat2.EmployeeNumber = Rec0025.EmployeeNumber
GROUP BY
     flat2.EmployeeNumber,
     TO_JSON_STRING(flat2.EmployeeEarningsDetail)

The results were (not correct at all) are below the JSON shows an f0_ string of all the earnings and the first Deduction followed by another row with the rest of the deductions in an array See below:结果(根本不正确)低于 JSON 显示所有收益的 f0_ 字符串和第一个扣除值,然后是另一行与 rest 的数组扣除值见下文:

EmployeeNumber  f0_ EmployeeDeductionsDetail
xxxx521     "[{""RecordNumberEarn"":""0001"",""EmployeeEarningsAmount"":""      375.52""},{""RecordNumberEarn"":""0002"",""EmployeeEarningsAmount"":""      387.26""}]" "{
  ""EmployeeDeductionsDetail"": [{
    ""RecordNumberDed"": ""0001"",
    ""EmployeePayDeductionAmount"": ""       50.65""
  }, {
    ""RecordNumberDed"": ""0002"",
    ""EmployeePayDeductionAmount"": ""       44.15""
  }, {
    ""RecordNumberDed"": ""0003"",
    ""EmployeePayDeductionAmount"": ""       44.15""
  }, {
    ""RecordNumberDed"": ""0004"",
    ""EmployeePayDeductionAmount"": ""       10.33""
  }, {
    ""RecordNumberDed"": ""0005"",
    ""EmployeePayDeductionAmount"": ""       10.33""
  }, {
    ""RecordNumberDed"": ""0006"",
    ""EmployeePayDeductionAmount"": ""       61.54""
  }, {
    ""RecordNumberDed"": ""0007"",
    ""EmployeePayDeductionAmount"": ""       13.22""
  }, {
    ""RecordNumberDed"": ""0008"",
    ""EmployeePayDeductionAmount"": ""        7.84""
  }, {
    ""RecordNumberDed"": ""0009"",
    ""EmployeePayDeductionAmount"": ""        0.69""
  }, {
    ""RecordNumberDed"": ""0010"",
    ""EmployeePayDeductionAmount"": ""        5.00""
  }]
}" ...

I have tried to put both ARRAY_AGGs in the same SELECT my arrays are cartesian products:我试图将两个 ARRAY_AGG 放在同一个 SELECT 我的 arrays 是笛卡尔积:

The SQL is: SQL 是:

SELECT 
     flat.EmployeeNumber,
     ARRAY_AGG(STRUCT(
          flat.RecordNumberEarn,
          flat.EmployeeEarningsAmount
     ))
     as EmployeeEarningsDetail,
     ARRAY_AGG(STRUCT(
          flat.RecordNumberDed,
          flat.EmployeePayDeductionAmount
     ))
     as EmployeeDeductionsDetail
FROM 
(SELECT DISTINCT
     stage.EmployeeNumber,
     Rec0015.RecordNumberEarn,
     Rec0015.EmployeeEarningsAmount,
     Rec0025.RecordNumberDed,
     Rec0025.EmployeePayDeductionAmount
FROM  bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging stage
INNER JOIN 
(SELECT EmployeeNumber,
 SPLIT(VariableData, '~')[SAFE_OFFSET(0)] AS RecordNumberEarn,
 SPLIT(VariableData, '~')[SAFE_OFFSET(4)] AS EmployeeEarningsAmount
 FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging
 WHERE RecordTypeNumber = '0015'
) Rec0015
ON stage.EmployeeNumber = Rec0015.EmployeeNumber  
INNER JOIN 
(SELECT DISTINCT EmployeeNumber,
 SPLIT(VariableData, '~')[SAFE_OFFSET(0)] AS RecordNumberDed,
 SPLIT(VariableData, '~')[SAFE_OFFSET(5)] AS EmployeePayDeductionAmount
 FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging
 WHERE RecordTypeNumber = '0025'
) Rec0025 
ON stage.EmployeeNumber = Rec0025.EmployeeNumber
) as flat
GROUP BY
     flat.EmployeeNumber

and the results are:结果是:

EmployeeNumber  EmployeeEarningsDetail  EmployeeDeductionsDetail
xxxx521     "{
  ""EmployeeEarningsDetail"": [{
    ""RecordNumberEarn"": ""0001"",
    ""EmployeeEarningsAmount"": ""      375.52""
  }, {
    ""RecordNumberEarn"": ""0001"",
    ""EmployeeEarningsAmount"": ""      375.52""
  }, {
    ""RecordNumberEarn"": ""0001"",
    ""EmployeeEarningsAmount"": ""      375.52""
  }, {
    ""RecordNumberEarn"": ""0001"",
    ""EmployeeEarningsAmount"": ""      375.52""
  }, {
    ""RecordNumberEarn"": ""0001"",
    ""EmployeeEarningsAmount"": ""      375.52""
  }, {
    ""RecordNumberEarn"": ""0001"",
    ""EmployeeEarningsAmount"": ""      375.52""
  }, {
    ""RecordNumberEarn"": ""0001"",
    ""EmployeeEarningsAmount"": ""      375.52""
  }, {
    ""RecordNumberEarn"": ""0001"",
    ""EmployeeEarningsAmount"": ""      375.52""
  }, {
    ""RecordNumberEarn"": ""0001"",
    ""EmployeeEarningsAmount"": ""      375.52""
  }, {
    ""RecordNumberEarn"": ""0001"",
    ""EmployeeEarningsAmount"": ""      375.52""
  }, {
    ""RecordNumberEarn"": ""0002"",
    ""EmployeeEarningsAmount"": ""      387.26""
  }, {
    ""RecordNumberEarn"": ""0002"",
    ""EmployeeEarningsAmount"": ""      387.26""
  }, {
    ""RecordNumberEarn"": ""0002"",
    ""EmployeeEarningsAmount"": ""      387.26""
  }, {
    ""RecordNumberEarn"": ""0002"",
    ""EmployeeEarningsAmount"": ""      387.26""
  }, {
    ""RecordNumberEarn"": ""0002"",
    ""EmployeeEarningsAmount"": ""      387.26""
  }, {
    ""RecordNumberEarn"": ""0002"",
    ""EmployeeEarningsAmount"": ""      387.26""
  }, {
    ""RecordNumberEarn"": ""0002"",
    ""EmployeeEarningsAmount"": ""      387.26""
  }, {
    ""RecordNumberEarn"": ""0002"",
    ""EmployeeEarningsAmount"": ""      387.26""
  }, {
    ""RecordNumberEarn"": ""0002"",
    ""EmployeeEarningsAmount"": ""      387.26""
  }, {
    ""RecordNumberEarn"": ""0002"",
    ""EmployeeEarningsAmount"": ""      387.26""
  }]
}"  "{
  ""EmployeeDeductionsDetail"": [{
    ""RecordNumberDed"": ""0001"",
    ""EmployeePayDeductionAmount"": ""       50.65""
  }, {
    ""RecordNumberDed"": ""0002"",
    ""EmployeePayDeductionAmount"": ""       44.15""
  }, {
    ""RecordNumberDed"": ""0003"",
    ""EmployeePayDeductionAmount"": ""       44.15""
  }, {
    ""RecordNumberDed"": ""0004"",
    ""EmployeePayDeductionAmount"": ""       10.33""
  }, {
    ""RecordNumberDed"": ""0005"",
    ""EmployeePayDeductionAmount"": ""       10.33""
  }, {
    ""RecordNumberDed"": ""0006"",
    ""EmployeePayDeductionAmount"": ""       61.54""
  }, {
    ""RecordNumberDed"": ""0007"",
    ""EmployeePayDeductionAmount"": ""       13.22""
  }, {
    ""RecordNumberDed"": ""0008"",
    ""EmployeePayDeductionAmount"": ""        7.84""
  }, {
    ""RecordNumberDed"": ""0009"",
    ""EmployeePayDeductionAmount"": ""        0.69""
  }, {
    ""RecordNumberDed"": ""0010"",
    ""EmployeePayDeductionAmount"": ""        5.00""
  }, {
    ""RecordNumberDed"": ""0001"",
    ""EmployeePayDeductionAmount"": ""       50.65""
  }, {
    ""RecordNumberDed"": ""0002"",
    ""EmployeePayDeductionAmount"": ""       44.15""
  }, {
    ""RecordNumberDed"": ""0003"",
    ""EmployeePayDeductionAmount"": ""       44.15""
  }, {
    ""RecordNumberDed"": ""0004"",
    ""EmployeePayDeductionAmount"": ""       10.33""
  }, {
    ""RecordNumberDed"": ""0005"",
    ""EmployeePayDeductionAmount"": ""       10.33""
  }, {
    ""RecordNumberDed"": ""0006"",
    ""EmployeePayDeductionAmount"": ""       61.54""
  }, {
    ""RecordNumberDed"": ""0007"",
    ""EmployeePayDeductionAmount"": ""       13.22""
  }, {
    ""RecordNumberDed"": ""0008"",
    ""EmployeePayDeductionAmount"": ""        7.84""
  }, {
    ""RecordNumberDed"": ""0009"",
    ""EmployeePayDeductionAmount"": ""        0.69""
  }, {
    ""RecordNumberDed"": ""0010"",
    ""EmployeePayDeductionAmount"": ""        5.00""
  }]
}" ...

Any suggestions on how I can "fix" this.关于如何“解决”这个问题的任何建议。

Thanks, David谢谢,大卫

A workaround for GROUP BY when the type does not support aggregation, eg STRUCT or GEOGRAPHY is to turn it to a STRING key.当类型不支持聚合(例如STRUCTGEOGRAPHY )时, GROUP BY的解决方法是将其转换为STRING键。 TO_JSON_STRING can do it for STRUCT, ST_AsText can do it for GEOGRAPHY, etc. Note this might not be very performant. TO_JSON_STRING可以为 STRUCT 执行此操作, ST_AsText可以为 GEOGRAPHY 执行此操作等。请注意,这可能不是很高效。

To get the original value, without any modification, use ANY_VALUE aggregation function - it picks arbitrary value from its inputs, here all the values are presumably the same, so we don't care which one.要获得原始值,无需任何修改,请使用ANY_VALUE聚合 function - 它从输入中选择任意值,这里所有的值都可能相同,所以我们不在乎哪一个。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM