简体   繁体   中英

How do I create a Hive External table from AVRO files writen using databricks?

The code below is how it was written into HDFS using scala. What is the HQL syntax to create a Hive table to query this data?

import com.databricks.spark.avro._
val path = "/user/myself/avrodata"
dataFrame.write.avro(path)

The examples I find require providing an avro.schema.literal to describe the schema or an avro.schema.url to the actual avro schema.

In the spark-shell all I would need to do to read this is:

scala> import com.databricks.spark.avro._
scala> val df = sqlContext.read.avro("/user/myself/avrodata")
scala> df.show()

So I cheated to get this to work. Basically I created a temporary table and used HQL to create and insert the data from the temp table. This method uses the metadata from the temporary table and creates the avro target table which I wanted to create and populate. If the data frame can create a temporary table from its schema, why could it not save the table as avro?

dataFrame.registerTempTable("my_tmp_table")
sqlContext.sql(s"create table ${schema}.${tableName} stored as avro as select * from ${tmptbl}")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM