简体   繁体   English

SPARK SQL分组集

[英]SPARK SQL GROUPING SETS

I need to pass various combination of column sets to my sql query as parameter 我需要将列集的各种组合作为参数传递给我的sql查询

eg: 例如:

Val result=sqlContext.sql(""" select col1,col2,col3,col4,col5,count(col6) from table T1 GROUP BY col1,col2,col3,col4,col5 GROUPING SETS ((col1,col2),(col3,col4),(col4, col5)) """)

There are several combination for which I need to find the aggregated value. 我需要针对几种组合来找到合计值。 Is there any way to pass these sets of column as parameter to SQL query instead of hard coding it manually. 有什么方法可以将这些列集作为参数传递给SQL查询,而不是手动对其进行硬编码。

Currently I have provided all the combination in sql query but if any new combination comes then again I would need to change the query. 目前,我已经在sql查询中提供了所有组合,但是如果有任何新组合出现,那么我将需要更改查询。 I am planning to have all the combination In a file and then read all and pass as parameter to sql query. 我打算将所有组合都放在一个文件中,然后全部读取并作为参数传递给sql查询。 Is it possible? 可能吗?

Example: Table 示例:表

id category age gender cust_id

1   101 54  M   1111
1   101 54  M   2222
1   101 55  M   3333
1   102 55     F    4444

""" select id, category, age, gender, count(cust_id) from table T1 group By id, category, age, gender
GROUPING SETS ((id,category),(id,age),(id,gender)) """

it should produce below result: 它应该产生以下结果:

group by (id, category) - count of cust_id 
1 101 3
1 102 1
group by (id and age) - count of cust_id
1 54 2
1 55 2
group by (id and gender) - count cust_id
1 M 3
1 F 1

this is just an example - I need to pass various different combination to GROPING SETS (not all combination) similarly as parameter in one go OR separately 这只是一个例子-我需要将各种不同的组合作为参数一次性传递给GROPING SETS(并非所有组合)

Any help would be really appreciated. 任何帮助将非常感激。

Thanks a lot. 非常感谢。

You can build SQL dynamically 您可以动态构建SQL

// original slices
var slices = List("(col1, col2)", "(col3, col4)", "(col4, col5)")
// adding new slice
slices = "(col1, col5)" :: slices 
// building SQL dynamically
val q =
s"""
with t1 as
(select 1 col1, 2 col2, 3 col3,
        4 col4, 5 col5, 6 col6)
select col1,col2,col3,col4,col5,count(col6)
  from t1
group by col1,col2,col3,col4,col5
grouping sets ${slices.mkString("(", ",", ")")}
"""
// output
spark.sql(q).show

Result 结果

scala> spark.sql(q).show
+----+----+----+----+----+-----------+
|col1|col2|col3|col4|col5|count(col6)|
+----+----+----+----+----+-----------+
|   1|null|null|null|   5|          1|
|   1|   2|null|null|null|          1|
|null|null|   3|   4|null|          1|
|null|null|null|   4|   5|          1|
+----+----+----+----+----+-----------+

combination of column sets to my sql query as parameter 列集的组合作为参数传递给我的sql查询

sql is executed by Spark not source database. sql是由Spark执行的,而不是源数据库。 It won't reach MySQL at all. 它根本不会到达MySQL。

I have provided all the combination 我已经提供了所有组合

You don't need GROUPING SETS if you want all possible combinations. 如果需要所有可能的组合,则不需要GROUPING SETS Just use CUBE : 只需使用CUBE

SELECT ... FROM table CUBE (col1,col2,col3,col4,col5)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM