在 Apache SPARK 中使用分層查詢

Question

我正在嘗試使用 Java 在 SPARK 中運行以下 SQL 查詢：

Dataset<Row> perIDDf = sparkSession.read().format("jdbc").option("url", connection).option("dbtable", "CI_PER_PER").load();


            perIDDf.createOrReplaceTempView("CI_PER_PER");
            Dataset<Row> perPerDF = sparkSession.sql("select per_id1,per_id2 " + 
                    "from CI_PER_PER " + 
                    "start with per_id1='2001822000' " + 
                    "connect by prior per_id1=per_id2");
            perPerDF.show(10,false);

我收到以下錯誤：

Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'with' expecting <EOF>(line 1, pos 45)

== SQL ==
select per_id1,per_id2 from CI_PER_PER start with per_id1='2001822000' connect by prior per_id1=per_id2
---------------------------------------------^^^

        at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:239)
        at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:115)
        at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
        at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638)
        at com.tfmwithspark.TestMaterializedView.main(TestMaterializedView.java:127)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

基本上我想在 SPARK 中使用分層查詢。

不支持嗎？

火花版本：2.3.0

Answer 1

Spark 目前不支持分層查詢，查詢中也不支持遞歸。 以最有限的方式，是。

你可以近似這個，但是很費勁。 這是一種方法，但我並不真正推薦它： http : //sqlandhadoop.com/how-to-implement-recursive-queries-in-spark/

Answer 2

PR為此已經提出檢查這個

解決您可以做的事情如下：

parent_query = """
SELECT asset_id as parent_id FROM {0}.{1}
where name = 'ROOT'
""".format(db_name,table_name)

parent_df = spark.sql(parent_query)
final_df = parent_df


child_query = """
SELECT parent_id as parent_to_drop,asset_id
FROM
{0}.{1}
""".format(db_name,table_name)

child_df = spark.sql(child_query)

count = 1
while count > 0:

  join_df = child_df.join(parent_df,(child_df.parent_to_drop == parent_df.parent_id)) \
        .drop("parent_to_drop") \
        .drop("parent_id") \
        .withColumnRenamed("asset_id","parent_id")
  count = join_df.count()
  final_df = final_df.union(join_df)
  parent_df = join_df

print("----------final-----------")
print(final_df.count())
final_df.show()

數據：

result :
----------final-----------

8

+---------+
|parent_id|
+---------+
|        0|
|        1|
|        5|
|        2|
|        7|
|        4|
|        3|
|        6|
+---------+

在 Apache SPARK 中使用分層查詢

問題描述

2 個解決方案

解決方案1
2 已采納 2019-02-13 11:44:25

解決方案2
1 2020-03-02 14:49:41

在 Apache SPARK 中使用分層查詢

問題描述

2 個解決方案

解決方案1 2 已采納 2019-02-13 11:44:25

解決方案2 1 2020-03-02 14:49:41

解決方案1
2 已采納 2019-02-13 11:44:25

解決方案2
1 2020-03-02 14:49:41