[英]SPARK 1.6 Insert into existing Hive table (non-partitioned)
鑒於我可以從下面的單個插入語句中獲取另一個堆棧溢出問題(謝謝),然后
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlContext.sql("CREATE TABLE IF NOT EXISTS e360_models.employee(id INT, name STRING, age INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'")
sqlContext.sql("insert into table e360_models.employee select t.* from (select 1210, 'rahul', 55) t")
sqlContext.sql("insert into table e360_models.employee select t.* from (select 1211, 'sriram pv', 35) t")
sqlContext.sql("insert into table e360_models.employee select t.* from (select 1212, 'gowri', 59) t")
val result = sqlContext.sql("FROM e360_models.employee SELECT id, name, age")
result.show()
如果想要從注冊為臨時表的SPARK DF中選擇插入選擇到現有的Hive表怎么辦? 我似乎無法讓它發揮作用。 它實際上可能嗎?
使用1.6 SPARK。 沒有興趣創建一個CTAS表,而是按照上面的方式插入,但是批量生成,例如
sqlContext.sql("INSERT INTO TABLE default.ged_555 SELECT t.* FROM mytempTable t")
據我
e360_models.employee
,你想在e360_models.employee
插入一些數據,然后你想選擇一些列並再次插入到default.ged_555
,你也不想做CTAS從e360_models.employee
准備一個數據幀然后再做如下
// since you are using hive I used hiveContext below...
val dataframe = hiveContext.sql("select * from e360_models.employee ");
df.show(10) // to verify whether data is there in dataframe or not
df.printSchema(); // print schema as well for debug purpose.
dataframe.write.mode(SaveMode.OverWrite).insertInto("default.ged_555")
val sampleDataFrame = hiveContext.sql("select * from default.get_555");
// again do print 10 records to verify your result for debug purpose
sampleDataFrame.show()
// again print schema of the target table
sampleDataFrame.printSchema()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.