簡體 English 中英

Hive中的增量更新

[英]Incremental updates in Hive

原文 2016-05-02 19:29:13 9 2 mysql/ hadoop/ hive/ bigdata

我有一個源MySql表。 我必須將日期導出到Hive以進行分析。 最初，當MySQL中的數據大小不太完整時，使用Sqoop不會將Mysql數據導出到Hive。 現在，隨着數據大小的增加，如何將MySql數據的增量更新配置為蜂巢？

2 個解決方案

您可以使用sqoop進行增量更新，Sqoop文檔很好，這是鏈接https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_incremental_imports

這是使用蜂巢/火花進行增量更新的示例。

scala> spark.sql("select * from table1").show +---+---+---------+ | id|sal|timestamp| +---+---+---------+ | 1|100| 30-08| | 2|200| 30-08| | 3|300| 30-08| | 4|400| 30-08| +---+---+---------+

scala> spark.sql("select * from table2").show +---+----+---------+ | id| sal|timestamp| +---+----+---------+ | 2| 300| 31-08| | 4|1000| 31-08| | 5| 500| 31-08| | 6| 600| 31-08| +---+----+---------+

scala> spark.sql("select b.id,b.sal from table1 a full outer join table2 b on a.id = b.id where b.id is not null union select a.id,a.sal from table1 a full outer join table2 b on a.id = b.id where b.id is null").show +---+----+ | id| sal| +---+----+ | 4|1000| | 6| 600| | 2| 300| | 5| 500| | 1| 100| | 3| 300| +---+----+

希望這種邏輯對您有用。