简体   繁体   中英

are nested queries supported in spark sql?

I made a table like

spark.sql("""create table temp stored as parquet TBLPROPERTIES('parquet.compression'='snappy') AS...""")

and I can query the table. however when I use this table in a subquery like

spark.sql("""
SELECT DISTINCT 
...
AND col in (select * from temp)
...
""")

I get this error

org.apache.spark.sql.AnalysisException: Table or view not found: `temp`

is this a limitation of spark? original table and column names are not used for confidentiality

Yes, nested queries are supported by SparkSQL.

Since you didn't give a clear example, see my example below. Consider the data set below created by the create statement, table name temp .

+---+-------------+------+
| id|         name|gender|
+---+-------------+------+
|  1|   John Smith|  Male|
|  2|   John Smith|  null|
|  3|Rebecca Davis|Female|
|  4|   Paul Adams|  Male|
|  5|         null|  null|
+---+-------------+------+

to get records whose name appears exactly once you can write something like below,

spark.sql("""SELECT T.* FROM temp AS T 
              RIGHT JOIN 
              (SELECT name, COUNT(name) AS count_name FROM temp GROUP BY name HAVING count_name = 1) AS sq 
              ON sq.name=T.name""")

This will give you,

+---+-------------+------+
| id|         name|gender|
+---+-------------+------+
|  3|Rebecca Davis|Female|
|  4|   Paul Adams|  Male|
+---+-------------+------+

For your example, I will suggest checking the queries again.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM