[英]Big Query Multiple Independent Arrays of Structures
I am trying to create two arrays of structures that are independent of each other.我正在尝试创建两个相互独立的结构 arrays。
I am trying to get these arrays created from another table that is an import of a flat file with different record types depending on the data (Rec0015 - Earnings and Rec0025 - Deductions).我正在尝试从另一个表中创建这些 arrays,该表是导入具有不同记录类型的平面文件,具体取决于数据(Rec0015 - 收入和 Rec0025 - 扣除)。 Each record type has a slightly different layout, so I import the record data as a string that is "~" delimited.
每种记录类型的布局略有不同,因此我将记录数据导入为以“~”分隔的字符串。 I have simplified my examples below to only show the basic data needed to illustrate my issues.
我在下面简化了我的示例,仅显示说明我的问题所需的基本数据。
If I run the queries independently of each other each array is created properly;如果我彼此独立运行查询,则每个数组都会正确创建; however, when I combine both I cannot get it to work.
但是,当我将两者结合起来时,我无法让它工作。
For the first query:对于第一个查询:
SELECT
flat.EmployeeNumber,
ARRAY_AGG(STRUCT(
flat.RecordNumberEarn,
flat.EmployeeEarningsAmount
))
as EmployeeEarningsDetail
FROM
(SELECT DISTINCT
stage.EmployeeNumber,
Rec0015.RecordNumberEarn,
Rec0015.EmployeeEarningsAmount,
FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging stage
INNER JOIN
(SELECT EmployeeNumber,
SPLIT(VariableData, '~')[SAFE_OFFSET(0)] AS RecordNumberEarn,
SPLIT(VariableData, '~')[SAFE_OFFSET(4)] AS EmployeeEarningsAmount
FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging
WHERE RecordTypeNumber = '0015'
) Rec0015
ON stage.EmployeeNumber = Rec0015.EmployeeNumber
) as flat
GROUP BY
flat.EmployeeNumber
I get the following result (which is correct):我得到以下结果(这是正确的):
EmployeeNumber EmployeeEarningsDetail
xxxx521 "{
""EmployeeEarningsDetail"": [{
""RecordNumberEarn"": ""0001"",
""EmployeeEarningsAmount"": "" 375.52""
}, {
""RecordNumberEarn"": ""0002"",
""EmployeeEarningsAmount"": "" 387.26""
}]
}" ....
When I run the "full" query:当我运行“完整”查询时:
SELECT
flat2.EmployeeNumber,
ARRAY_AGG(STRUCT(
Rec0025.RecordNumberDed,
Rec0025.EmployeePayDeductionAmount
))
as EmployeeDeductionsDetail
FROM
(SELECT
flat.EmployeeNumber,
ARRAY_AGG(STRUCT(
flat.RecordNumberEarn,
flat.EmployeeEarningsAmount
))
as EmployeeEarningsDetail
FROM
(SELECT DISTINCT
stage.EmployeeNumber,
Rec0015.RecordNumberEarn,
Rec0015.EmployeeEarningsAmount,
FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging stage
INNER JOIN
(SELECT EmployeeNumber,
SPLIT(VariableData, '~')[SAFE_OFFSET(0)] AS RecordNumberEarn,
SPLIT(VariableData, '~')[SAFE_OFFSET(4)] AS EmployeeEarningsAmount
FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging
WHERE RecordTypeNumber = '0015'
) Rec0015
ON stage.EmployeeNumber = Rec0015.EmployeeNumber
) as flat
GROUP BY
flat.EmployeeNumber
) as flat2
INNER JOIN
(SELECT DISTINCT EmployeeNumber,
SPLIT(VariableData, '~')[SAFE_OFFSET(0)] AS RecordNumberDed,
SPLIT(VariableData, '~')[SAFE_OFFSET(5)] AS EmployeePayDeductionAmount
FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging
WHERE RecordTypeNumber = '0025'
) Rec0025
ON flat2.EmployeeNumber = Rec0025.EmployeeNumber
GROUP BY
flat2.EmployeeNumber
I get the following result (which is correct, but doesn't include the first array structure):我得到以下结果(这是正确的,但不包括第一个数组结构):
EmployeeNumber EmployeeDeductionsDetail
xxxx521 "{
""EmployeeDeductionsDetail"": [{
""RecordNumberDed"": ""0001"",
""EmployeePayDeductionAmount"": "" 50.65""
}, {
""RecordNumberDed"": ""0002"",
""EmployeePayDeductionAmount"": "" 44.15""
}, {
""RecordNumberDed"": ""0003"",
""EmployeePayDeductionAmount"": "" 44.15""
}, {
""RecordNumberDed"": ""0004"",
""EmployeePayDeductionAmount"": "" 10.33""
}, {
""RecordNumberDed"": ""0005"",
""EmployeePayDeductionAmount"": "" 10.33""
}, {
""RecordNumberDed"": ""0006"",
""EmployeePayDeductionAmount"": "" 61.54""
}, {
""RecordNumberDed"": ""0007"",
""EmployeePayDeductionAmount"": "" 13.22""
}, {
""RecordNumberDed"": ""0008"",
""EmployeePayDeductionAmount"": "" 7.84""
}, {
""RecordNumberDed"": ""0009"",
""EmployeePayDeductionAmount"": "" 0.69""
}, {
""RecordNumberDed"": ""0010"",
""EmployeePayDeductionAmount"": "" 5.00""
}]
}" ...
However, in the 2nd query, I really want to do a group by on the array structure EmployeeEarningsDetail, but when I add the array to the select and group by I get the error:但是,在第二个查询中,我真的想对数组结构 EmployeeEarningsDetail 进行分组,但是当我将数组添加到 select 和分组时,我得到了错误:
"Grouping by expressions of type ARRAY is not allowed." “不允许按 ARRAY 类型的表达式分组。”
I tried adding a TO_JSON_STRING(EmployeeEarningsDetail) in both the Select and the Group by, but I got a column of just the string not as an array as below:我尝试在 Select 和 Group by 中添加 TO_JSON_STRING(EmployeeEarningsDetail) ,但我得到的只是字符串列而不是数组,如下所示:
SELECT
flat2.EmployeeNumber,
TO_JSON_STRING(flat2.EmployeeEarningsDetail),
ARRAY_AGG(STRUCT(
Rec0025.RecordNumberDed,
Rec0025.EmployeePayDeductionAmount
))
as EmployeeDeductionsDetail
FROM
(SELECT
flat.EmployeeNumber,
ARRAY_AGG(STRUCT(
flat.RecordNumberEarn,
flat.EmployeeEarningsAmount
))
as EmployeeEarningsDetail
FROM
(SELECT DISTINCT
stage.EmployeeNumber,
Rec0015.RecordNumberEarn,
Rec0015.EmployeeEarningsAmount,
FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging stage
INNER JOIN
(SELECT EmployeeNumber,
SPLIT(VariableData, '~')[SAFE_OFFSET(0)] AS RecordNumberEarn,
SPLIT(VariableData, '~')[SAFE_OFFSET(4)] AS EmployeeEarningsAmount
FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging
WHERE RecordTypeNumber = '0015'
) Rec0015
ON stage.EmployeeNumber = Rec0015.EmployeeNumber
) as flat
GROUP BY
flat.EmployeeNumber
) as flat2
INNER JOIN
(SELECT DISTINCT EmployeeNumber,
SPLIT(VariableData, '~')[SAFE_OFFSET(0)] AS RecordNumberDed,
SPLIT(VariableData, '~')[SAFE_OFFSET(5)] AS EmployeePayDeductionAmount
FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging
WHERE RecordTypeNumber = '0025'
) Rec0025
ON flat2.EmployeeNumber = Rec0025.EmployeeNumber
GROUP BY
flat2.EmployeeNumber,
TO_JSON_STRING(flat2.EmployeeEarningsDetail)
The results were (not correct at all) are below the JSON shows an f0_ string of all the earnings and the first Deduction followed by another row with the rest of the deductions in an array See below:结果(根本不正确)低于 JSON 显示所有收益的 f0_ 字符串和第一个扣除值,然后是另一行与 rest 的数组扣除值见下文:
EmployeeNumber f0_ EmployeeDeductionsDetail
xxxx521 "[{""RecordNumberEarn"":""0001"",""EmployeeEarningsAmount"":"" 375.52""},{""RecordNumberEarn"":""0002"",""EmployeeEarningsAmount"":"" 387.26""}]" "{
""EmployeeDeductionsDetail"": [{
""RecordNumberDed"": ""0001"",
""EmployeePayDeductionAmount"": "" 50.65""
}, {
""RecordNumberDed"": ""0002"",
""EmployeePayDeductionAmount"": "" 44.15""
}, {
""RecordNumberDed"": ""0003"",
""EmployeePayDeductionAmount"": "" 44.15""
}, {
""RecordNumberDed"": ""0004"",
""EmployeePayDeductionAmount"": "" 10.33""
}, {
""RecordNumberDed"": ""0005"",
""EmployeePayDeductionAmount"": "" 10.33""
}, {
""RecordNumberDed"": ""0006"",
""EmployeePayDeductionAmount"": "" 61.54""
}, {
""RecordNumberDed"": ""0007"",
""EmployeePayDeductionAmount"": "" 13.22""
}, {
""RecordNumberDed"": ""0008"",
""EmployeePayDeductionAmount"": "" 7.84""
}, {
""RecordNumberDed"": ""0009"",
""EmployeePayDeductionAmount"": "" 0.69""
}, {
""RecordNumberDed"": ""0010"",
""EmployeePayDeductionAmount"": "" 5.00""
}]
}" ...
I have tried to put both ARRAY_AGGs in the same SELECT my arrays are cartesian products:我试图将两个 ARRAY_AGG 放在同一个 SELECT 我的 arrays 是笛卡尔积:
The SQL is: SQL 是:
SELECT
flat.EmployeeNumber,
ARRAY_AGG(STRUCT(
flat.RecordNumberEarn,
flat.EmployeeEarningsAmount
))
as EmployeeEarningsDetail,
ARRAY_AGG(STRUCT(
flat.RecordNumberDed,
flat.EmployeePayDeductionAmount
))
as EmployeeDeductionsDetail
FROM
(SELECT DISTINCT
stage.EmployeeNumber,
Rec0015.RecordNumberEarn,
Rec0015.EmployeeEarningsAmount,
Rec0025.RecordNumberDed,
Rec0025.EmployeePayDeductionAmount
FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging stage
INNER JOIN
(SELECT EmployeeNumber,
SPLIT(VariableData, '~')[SAFE_OFFSET(0)] AS RecordNumberEarn,
SPLIT(VariableData, '~')[SAFE_OFFSET(4)] AS EmployeeEarningsAmount
FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging
WHERE RecordTypeNumber = '0015'
) Rec0015
ON stage.EmployeeNumber = Rec0015.EmployeeNumber
INNER JOIN
(SELECT DISTINCT EmployeeNumber,
SPLIT(VariableData, '~')[SAFE_OFFSET(0)] AS RecordNumberDed,
SPLIT(VariableData, '~')[SAFE_OFFSET(5)] AS EmployeePayDeductionAmount
FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging
WHERE RecordTypeNumber = '0025'
) Rec0025
ON stage.EmployeeNumber = Rec0025.EmployeeNumber
) as flat
GROUP BY
flat.EmployeeNumber
and the results are:结果是:
EmployeeNumber EmployeeEarningsDetail EmployeeDeductionsDetail
xxxx521 "{
""EmployeeEarningsDetail"": [{
""RecordNumberEarn"": ""0001"",
""EmployeeEarningsAmount"": "" 375.52""
}, {
""RecordNumberEarn"": ""0001"",
""EmployeeEarningsAmount"": "" 375.52""
}, {
""RecordNumberEarn"": ""0001"",
""EmployeeEarningsAmount"": "" 375.52""
}, {
""RecordNumberEarn"": ""0001"",
""EmployeeEarningsAmount"": "" 375.52""
}, {
""RecordNumberEarn"": ""0001"",
""EmployeeEarningsAmount"": "" 375.52""
}, {
""RecordNumberEarn"": ""0001"",
""EmployeeEarningsAmount"": "" 375.52""
}, {
""RecordNumberEarn"": ""0001"",
""EmployeeEarningsAmount"": "" 375.52""
}, {
""RecordNumberEarn"": ""0001"",
""EmployeeEarningsAmount"": "" 375.52""
}, {
""RecordNumberEarn"": ""0001"",
""EmployeeEarningsAmount"": "" 375.52""
}, {
""RecordNumberEarn"": ""0001"",
""EmployeeEarningsAmount"": "" 375.52""
}, {
""RecordNumberEarn"": ""0002"",
""EmployeeEarningsAmount"": "" 387.26""
}, {
""RecordNumberEarn"": ""0002"",
""EmployeeEarningsAmount"": "" 387.26""
}, {
""RecordNumberEarn"": ""0002"",
""EmployeeEarningsAmount"": "" 387.26""
}, {
""RecordNumberEarn"": ""0002"",
""EmployeeEarningsAmount"": "" 387.26""
}, {
""RecordNumberEarn"": ""0002"",
""EmployeeEarningsAmount"": "" 387.26""
}, {
""RecordNumberEarn"": ""0002"",
""EmployeeEarningsAmount"": "" 387.26""
}, {
""RecordNumberEarn"": ""0002"",
""EmployeeEarningsAmount"": "" 387.26""
}, {
""RecordNumberEarn"": ""0002"",
""EmployeeEarningsAmount"": "" 387.26""
}, {
""RecordNumberEarn"": ""0002"",
""EmployeeEarningsAmount"": "" 387.26""
}, {
""RecordNumberEarn"": ""0002"",
""EmployeeEarningsAmount"": "" 387.26""
}]
}" "{
""EmployeeDeductionsDetail"": [{
""RecordNumberDed"": ""0001"",
""EmployeePayDeductionAmount"": "" 50.65""
}, {
""RecordNumberDed"": ""0002"",
""EmployeePayDeductionAmount"": "" 44.15""
}, {
""RecordNumberDed"": ""0003"",
""EmployeePayDeductionAmount"": "" 44.15""
}, {
""RecordNumberDed"": ""0004"",
""EmployeePayDeductionAmount"": "" 10.33""
}, {
""RecordNumberDed"": ""0005"",
""EmployeePayDeductionAmount"": "" 10.33""
}, {
""RecordNumberDed"": ""0006"",
""EmployeePayDeductionAmount"": "" 61.54""
}, {
""RecordNumberDed"": ""0007"",
""EmployeePayDeductionAmount"": "" 13.22""
}, {
""RecordNumberDed"": ""0008"",
""EmployeePayDeductionAmount"": "" 7.84""
}, {
""RecordNumberDed"": ""0009"",
""EmployeePayDeductionAmount"": "" 0.69""
}, {
""RecordNumberDed"": ""0010"",
""EmployeePayDeductionAmount"": "" 5.00""
}, {
""RecordNumberDed"": ""0001"",
""EmployeePayDeductionAmount"": "" 50.65""
}, {
""RecordNumberDed"": ""0002"",
""EmployeePayDeductionAmount"": "" 44.15""
}, {
""RecordNumberDed"": ""0003"",
""EmployeePayDeductionAmount"": "" 44.15""
}, {
""RecordNumberDed"": ""0004"",
""EmployeePayDeductionAmount"": "" 10.33""
}, {
""RecordNumberDed"": ""0005"",
""EmployeePayDeductionAmount"": "" 10.33""
}, {
""RecordNumberDed"": ""0006"",
""EmployeePayDeductionAmount"": "" 61.54""
}, {
""RecordNumberDed"": ""0007"",
""EmployeePayDeductionAmount"": "" 13.22""
}, {
""RecordNumberDed"": ""0008"",
""EmployeePayDeductionAmount"": "" 7.84""
}, {
""RecordNumberDed"": ""0009"",
""EmployeePayDeductionAmount"": "" 0.69""
}, {
""RecordNumberDed"": ""0010"",
""EmployeePayDeductionAmount"": "" 5.00""
}]
}" ...
Any suggestions on how I can "fix" this.关于如何“解决”这个问题的任何建议。
Thanks, David谢谢,大卫
A workaround for GROUP BY
when the type does not support aggregation, eg STRUCT
or GEOGRAPHY
is to turn it to a STRING
key.当类型不支持聚合(例如
STRUCT
或GEOGRAPHY
)时, GROUP BY
的解决方法是将其转换为STRING
键。 TO_JSON_STRING
can do it for STRUCT, ST_AsText
can do it for GEOGRAPHY, etc. Note this might not be very performant. TO_JSON_STRING
可以为 STRUCT 执行此操作, ST_AsText
可以为 GEOGRAPHY 执行此操作等。请注意,这可能不是很高效。
To get the original value, without any modification, use ANY_VALUE
aggregation function - it picks arbitrary value from its inputs, here all the values are presumably the same, so we don't care which one.要获得原始值,无需任何修改,请使用
ANY_VALUE
聚合 function - 它从输入中选择任意值,这里所有的值都可能相同,所以我们不在乎哪一个。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.