简体   繁体   中英

How to get the hive partition column name using spark

I have a table in hive whose DDL looks like this.

CREATE TABLE ABC( name string) PARTITIONED BY ( col1 string, col2 bigint, col3 string, col4 string)

I have a requirement in which I have to store the non partition column name of hive table into variable1 and partition column name into variable2 using spark scala.

Desired output would be :

 variable1='name'    

 variable2='col1,col2,col3,col4'

I am following the below approach but not able to get the same.

val df=sql("desc default.ABC")

val df2=df.map(r => r.getString(0)).collect.toList

List[String] = List(name, col1, col2, col3, col4, # Partition Information, # col_name, col1, col2, col3, col4)

Can you please help me with the approach?

try this :

import org.apache.spark.sql.functions._
val partitionsColumns = spark.catalog.listColumns("wikicc").where(col("ispartition") === true).select("name").collect().map(_.getAs[String]("name"))
val noParitionsColumns = spark.catalog.listColumns("wikicc").where(col("ispartition") === false).select("name").collect().map(_.getAs[String]("name"))
println(partitionsColumns.mkString(","))
println(noParitionsColumns.mkString(","))

Try it:

val df=sql("show partitions default.ABC")

you can get 'partition' column. then collect get you want.

If you want create table partition,you can use spark-sql execute

show create table tableName

you can get 'createtab_stmt' column of Dataset.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM