SPARK SQL分組集

Question

我需要將列集的各種組合作為參數傳遞給我的sql查詢

例如：

Val result=sqlContext.sql(""" select col1,col2,col3,col4,col5,count(col6) from table T1 GROUP BY col1,col2,col3,col4,col5 GROUPING SETS ((col1,col2),(col3,col4),(col4, col5)) """)

我需要針對幾種組合來找到合計值。 有什么方法可以將這些列集作為參數傳遞給SQL查詢，而不是手動對其進行硬編碼。

目前，我已經在sql查詢中提供了所有組合，但是如果有任何新組合出現，那么我將需要更改查詢。 我打算將所有組合都放在一個文件中，然后全部讀取並作為參數傳遞給sql查詢。 可能嗎？

示例：表

id category age gender cust_id

1   101 54  M   1111
1   101 54  M   2222
1   101 55  M   3333
1   102 55     F    4444

""" select id, category, age, gender, count(cust_id) from table T1 group By id, category, age, gender
GROUPING SETS ((id,category),(id,age),(id,gender)) """

它應該產生以下結果：

group by (id, category) - count of cust_id 
1 101 3
1 102 1
group by (id and age) - count of cust_id
1 54 2
1 55 2
group by (id and gender) - count cust_id
1 M 3
1 F 1

這只是一個例子-我需要將各種不同的組合作為參數一次性傳遞給GROPING SETS（並非所有組合）

任何幫助將非常感激。

非常感謝。

Answer 1

您可以動態構建SQL

// original slices
var slices = List("(col1, col2)", "(col3, col4)", "(col4, col5)")
// adding new slice
slices = "(col1, col5)" :: slices 
// building SQL dynamically
val q =
s"""
with t1 as
(select 1 col1, 2 col2, 3 col3,
        4 col4, 5 col5, 6 col6)
select col1,col2,col3,col4,col5,count(col6)
  from t1
group by col1,col2,col3,col4,col5
grouping sets ${slices.mkString("(", ",", ")")}
"""
// output
spark.sql(q).show

結果

scala> spark.sql(q).show
+----+----+----+----+----+-----------+
|col1|col2|col3|col4|col5|count(col6)|
+----+----+----+----+----+-----------+
|   1|null|null|null|   5|          1|
|   1|   2|null|null|null|          1|
|null|null|   3|   4|null|          1|
|null|null|null|   4|   5|          1|
+----+----+----+----+----+-----------+

Answer 2

列集的組合作為參數傳遞給我的sql查詢

sql是由Spark執行的，而不是源數據庫。 它根本不會到達MySQL。

我已經提供了所有組合

如果需要所有可能的組合，則不需要GROUPING SETS 。 只需使用CUBE ：

SELECT ... FROM table CUBE (col1,col2,col3,col4,col5)

SPARK SQL分組集

問題描述

2 個解決方案

解決方案1
1 2017-10-18 13:35:02

解決方案2
0

SPARK SQL分組集

問題描述

2 個解決方案

解決方案1 1 2017-10-18 13:35:02

解決方案2 0

解決方案1
1 2017-10-18 13:35:02

解決方案2
0