簡體   English   中英

SPARK 1.6插入現有的Hive表(未分區)

[英]SPARK 1.6 Insert into existing Hive table (non-partitioned)

鑒於我可以從下面的單個插入語句中獲取另一個堆棧溢出問題(謝謝),然后

  val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
  sqlContext.sql("CREATE TABLE IF NOT EXISTS e360_models.employee(id INT, name STRING, age INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'")

   sqlContext.sql("insert into table e360_models.employee select t.* from (select 1210, 'rahul', 55) t")
   sqlContext.sql("insert into table e360_models.employee select t.* from (select 1211, 'sriram pv', 35) t")
   sqlContext.sql("insert into table e360_models.employee select t.* from (select 1212, 'gowri', 59) t")

   val result = sqlContext.sql("FROM e360_models.employee SELECT id, name, age")
   result.show()

如果想要從注冊為臨時表的SPARK DF中選擇插入選擇到現有的Hive表怎么辦? 我似乎無法讓它發揮作用。 它實際上可能嗎?

使用1.6 SPARK。 沒有興趣創建一個CTAS表,而是按照上面的方式插入,但是批量生成,例如

sqlContext.sql("INSERT INTO TABLE default.ged_555 SELECT t.* FROM mytempTable t")

據我e360_models.employee ,你想在e360_models.employee插入一些數據,然后你想選擇一些列並再次插入到default.ged_555 ,你也不想做CTAS從e360_models.employee准備一個數據幀然后再做如下

// since you are using hive I used hiveContext below... 
 val dataframe = hiveContext.sql("select * from e360_models.employee ");

df.show(10) // to verify whether data is there in dataframe or not



df.printSchema(); // print schema as well for debug purpose.
    dataframe.write.mode(SaveMode.OverWrite).insertInto("default.ged_555")

val sampleDataFrame = hiveContext.sql("select * from default.get_555");

// again do print 10 records to verify your result for debug purpose
sampleDataFrame.show()
// again print schema of the target table
sampleDataFrame.printSchema()

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM