简体   繁体   English

Spark Streaming - 刷新静态数据

[英]Spark Streaming - Refresh Static Data

I have a Spark Streaming job, which when it starts, queries Hive and creates a Map[Int, String] object, which is then used for parts of the calculations the job performs.我有一个 Spark Streaming 作业,它在启动时查询 Hive 并创建一个 Map[Int, String] 对象,然后将其用于作业执行的部分计算。

The problem I have is that the data in Hive has the potential changes every 2 hours.我遇到的问题是 Hive 中的数据每 2 小时就有一次潜在的变化。 I would like to have the ability to refresh the static data on a schedule, without having to restart the Spark Job every time.我希望能够按计划刷新静态数据,而不必每次都重新启动 Spark 作业。

The initial load of the Map object takes around a 1minute. Map 对象的初始加载大约需要 1 分钟。

Any help is very welcome.非常欢迎任何帮助。

You can use a listener.您可以使用侦听器。 Which will be triggered every time when a job is started for any stream within the spark context.每次为 spark 上下文中的任何流启动作业时都会触发。 Since your db is updated every two hours there is no harm updating it every-time AFAIK.由于您的数据库每两小时更新一次,因此每次 AFAIK 更新它都没有坏处。

sc.addSparkListener(new SparkListener() {
override def onSparkListenerJobStart(jobStart: SparkListenerJobStart) {
//load data that to the map that will be sent to executor
}


});

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM