简体   繁体   中英

How to create a partitioned table using Spark SQL

I know we can create a auto partition discovery table via

CREATE TABLE my_table
USING com.databricks.spark.avro
OPTIONS (path "/path/to/table");

But this requires change the data path to partition_key=partition_value format

/path/to/table/dt=2016-10-09
/path/to/table/dt=2016-10-10
/path/to/table/dt=2016-10-11

But the data structure looks like:

/path/to/table/2016-10-09
/path/to/table/2016-10-10
/path/to/table/2016-10-11

I don't want to change existing data structure, so I am trying to do it like Hive way, which I just create a partition table, then I can add these partitions by myself, so I don't need to change existing data structure to partition_key=partition_value format.

But the SQL below didn't work:

CREATE TABLE my_table
USING com.databricks.spark.avro
PARTITIONED BY (dt)
OPTIONS (path "/path/to/table");

The SQL command line tool will throw exception: Error in query: cannot recognize input near 'thrive_event_pt' 'USING' 'com' in table name; line 2 pos 0 Error in query: cannot recognize input near 'thrive_event_pt' 'USING' 'com' in table name; line 2 pos 0

Does Spark SQL support to create a partitioned table in this way? or there is something else I am missing?

Probably this is not supported by Spark yet. I had the same problems with AVRO files and bucketed tables with Spark 2.0, converted to ORC first and then it worked. So try underlying ORC files instead of AVRO. Use ORC files in "current" and use AVRO files in your "archive" for example.

Bucketing and Partitioning is something that is fairly new to Spark (SQL). Maybe they will support this features in the future. Even early versions (below 2.x) before Hive do not support everything surrounding bucketing and creating tables. Partitioning on the other hand is an older more evolved thing in Hive.

Spark 2.3 supports this functionality now. If you are using EMR, then the image 5.13 supports spark 2.3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM