简体   繁体   中英

Use of Hierarchical queries in Apache SPARK

I am trying to run below SQL query in SPARK using Java :

Dataset<Row> perIDDf = sparkSession.read().format("jdbc").option("url", connection).option("dbtable", "CI_PER_PER").load();


            perIDDf.createOrReplaceTempView("CI_PER_PER");
            Dataset<Row> perPerDF = sparkSession.sql("select per_id1,per_id2 " + 
                    "from CI_PER_PER " + 
                    "start with per_id1='2001822000' " + 
                    "connect by prior per_id1=per_id2");
            perPerDF.show(10,false);

I am getting below error:

Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'with' expecting <EOF>(line 1, pos 45)

== SQL ==
select per_id1,per_id2 from CI_PER_PER start with per_id1='2001822000' connect by prior per_id1=per_id2
---------------------------------------------^^^

        at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:239)
        at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:115)
        at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
        at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638)
        at com.tfmwithspark.TestMaterializedView.main(TestMaterializedView.java:127)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Basically I am trying to use Hierarchical query in SPARK.

Is it not supported?

SPARK VERSION : 2.3.0

Hierarchical query is not supported with Spark currently, nor recursion in the query. WITH in the most limited fashion, is.

You can approximate this, but is is arduous. Here is an approach, but I do not really recommend it: http://sqlandhadoop.com/how-to-implement-recursive-queries-in-spark/

PR for this is already raised check this

work around what you can do is below:

parent_query = """
SELECT asset_id as parent_id FROM {0}.{1}
where name = 'ROOT'
""".format(db_name,table_name)

parent_df = spark.sql(parent_query)
final_df = parent_df


child_query = """
SELECT parent_id as parent_to_drop,asset_id
FROM
{0}.{1}
""".format(db_name,table_name)

child_df = spark.sql(child_query)

count = 1
while count > 0:

  join_df = child_df.join(parent_df,(child_df.parent_to_drop == parent_df.parent_id)) \
        .drop("parent_to_drop") \
        .drop("parent_id") \
        .withColumnRenamed("asset_id","parent_id")
  count = join_df.count()
  final_df = final_df.union(join_df)
  parent_df = join_df

print("----------final-----------")
print(final_df.count())
final_df.show()

data : 在此处输入图片说明

result :
----------final-----------

8

+---------+
|parent_id|
+---------+
|        0|
|        1|
|        5|
|        2|
|        7|
|        4|
|        3|
|        6|
+---------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM