简体   繁体   English

如何在Google BigQuery中旋转数据集?

[英]How can I pivot dataset in Google BigQuery?

I have a massive dataset with this schema: 我有一个具有这种模式的海量数据集:

Customer    INTEGER
CategoryID  INTEGER
CategoryName    STRING
ProjectStage    INTEGER
NextStepID  INTEGER
NextStepName    STRING
NextStepIsAnchor    BOOLEAN

I heed to get the resulting set where each customer will be only on one row and his/her next steps will be in the columnts like this: 我注意得到的结果集是,每个客户只能排成一行,而他/她的下一步将在这样的列中:

Customer | 客户| CategoryID | 分类ID | CategoryName | 分类名称| ProjectStage | ProjectStage | NextStep1ID | NextStep1ID | NextStep1Name | NextStep1名称| NextStep2ID | NextStep2ID | NextStep2Name | NextStep2Name | ... etc. ...等

I tried to play with NTH function of BigQuery but it works only for the first occurrence of the NextStepID: 我尝试使用BigQuery的NTH函数,但仅适用于首次出现的NextStepID:

SELECT 
customer, 
nth(1, NextStepID)
FROM [2015_05.customers_wunique_nextsteps] 
group by customer

but when I try to add more columns: 但是当我尝试添加更多列时:

SELECT 
customer, 
nth(1, NextStepID),
nth(2, NextStepID)
FROM [2015_05.customers_wunique_nextsteps] 
group by customer

I get this error: 我收到此错误:

Error: Function 'NTH(2, [NextStepID])' cannot be used in a distributed query, this function can only be correctly computed for queries that run on a single node. 错误:函数'NTH(2,[NextStepID])'不能在分布式查询中使用,只能为在单个节点上运行的查询正确计算此函数。

Any ideas? 有任何想法吗? Now I "pivot" the results with Excel and small VBA script, but when datasets grow bigger calculation time exceeds all limits... 现在,我使用Excel和小型VBA脚本“透视”结果,但是当数据集变得更大时,计算时间超过了所有限制...

Thanks in advance! 提前致谢! :) :)

Function NTH is applicable to REPEATED fields, where it chooses the nth repeating element (the error message can be improved). 函数NTH适用于REPEATED字段,在该字段中,选择第n个重复元素(可以改善错误消息)。 So first step would be to build REPEATED field out of NextStepID, and it can be done with NEST aggregation function. 因此,第一步将是从NextStepID中构建REPEATED字段,并且可以使用NEST聚合函数完成此操作。 Then you can use NTH as scoped aggregation function: 然后,您可以使用NTH作为范围聚合函数:

SELECT
  Customer,
  NTH(1, NextStepID) WITHIN RECORD AS NextStepID1,
  NTH(2, NextStepID) WITHIN RECORD AS NextStepID2,
  NTH(3, NextStepID) WITHIN RECORD AS NextStepID3
FROM (
SELECT Customer, NEST(NextStepID) AS NextStepID
FROM [2015_05.customers_wunique_nextsteps] GROUP BY Customer)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM