I am trying to run below SQL query in SPARK using Java :
Dataset<Row> perIDDf = sparkSession.read().format("jdbc").option("url", connection).option("dbtable", "CI_PER_PER").load();
perIDDf.createOrReplaceTempView("CI_PER_PER");
Dataset<Row> perPerDF = sparkSession.sql("select per_id1,per_id2 " +
"from CI_PER_PER " +
"start with per_id1='2001822000' " +
"connect by prior per_id1=per_id2");
perPerDF.show(10,false);
I am getting below error:
Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'with' expecting <EOF>(line 1, pos 45)
== SQL ==
select per_id1,per_id2 from CI_PER_PER start with per_id1='2001822000' connect by prior per_id1=per_id2
---------------------------------------------^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:239)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:115)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638)
at com.tfmwithspark.TestMaterializedView.main(TestMaterializedView.java:127)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Basically I am trying to use Hierarchical query in SPARK.
Is it not supported?
SPARK VERSION : 2.3.0
Hierarchical query is not supported with Spark currently, nor recursion in the query. WITH in the most limited fashion, is.
You can approximate this, but is is arduous. Here is an approach, but I do not really recommend it: http://sqlandhadoop.com/how-to-implement-recursive-queries-in-spark/
PR for this is already raised check this
work around what you can do is below:
parent_query = """
SELECT asset_id as parent_id FROM {0}.{1}
where name = 'ROOT'
""".format(db_name,table_name)
parent_df = spark.sql(parent_query)
final_df = parent_df
child_query = """
SELECT parent_id as parent_to_drop,asset_id
FROM
{0}.{1}
""".format(db_name,table_name)
child_df = spark.sql(child_query)
count = 1
while count > 0:
join_df = child_df.join(parent_df,(child_df.parent_to_drop == parent_df.parent_id)) \
.drop("parent_to_drop") \
.drop("parent_id") \
.withColumnRenamed("asset_id","parent_id")
count = join_df.count()
final_df = final_df.union(join_df)
parent_df = join_df
print("----------final-----------")
print(final_df.count())
final_df.show()
result :
----------final-----------
8
+---------+
|parent_id|
+---------+
| 0|
| 1|
| 5|
| 2|
| 7|
| 4|
| 3|
| 6|
+---------+
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.