简体   繁体   English

将数据帧转换为spark scala中的hive表

[英]Convert dataframe to hive table in spark scala

I am trying to convert a dataframe to hive table in spark Scala. 我试图将数据帧转换为spark Scala中的hive表。 I have read in a dataframe from an XML file. 我已从XML文件中读取数据框。 It uses SQL context to do so. 它使用SQL上下文来执行此操作。 I want to convert save this dataframe as a hive table. 我想将此数据帧保存为hive表。 I am getting this error: 我收到此错误:

"WARN HiveContext$$anon$1: Could not persist database_1 . test_table in a Hive compatible way. Persisting it into Hive metastore in Spark SQL specific format." “WARN HiveContext $$匿名$ 1:无法坚持database_1test_table 。在蜂房兼容的方式,坚持到星火SQL特定格式蜂巢metastore”

object spark_conversion {
def main(args: Array[String]): Unit = {

if (args.length < 2) {
  System.err.println("Usage: <input file> <output dir>")
  System.exit(1)
}
val in_path = args(0)
val out_path_csv = args(1)
val conf = new SparkConf()
         .setMaster("local[2]")
         .setAppName("conversion")
val sc = new SparkContext(conf)

val hiveContext = new HiveContext(sc)

val df = hiveContext.read
  .format("com.databricks.spark.xml")
  .option("rowTag", "PolicyPeriod")
  .option("attributePrefix", "attr_")
  .load(in_path)

df.write
  .format("com.databricks.spark.csv")
  .option("header", "true")
  .save(out_path_csv)

df.saveAsTable("database_1.test_table")

df.printSchema()
df.show()

saveAsTable in spark is not compatible with hive. spark中的saveAsTable与hive不兼容。 I am on CDH 5.5.2. 我在CDH 5.5.2。 Workaround from cloudera website: 来自cloudera网站的解决方法:

df.registerTempTable(tempName) 
hsc.sql(s"""   
CREATE TABLE $tableName (     
// field definitions   ) 
STORED AS $format """) 
hsc.sql(s"INSERT INTO TABLE $tableName SELECT * FROM $tempName")

http://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_spark_ki.html http://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_spark_ki.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM