简体   繁体   English

HashMap 作为 Spark Streaming 中的广播变量?

[英]HashMap as a Broadcast Variable in Spark Streaming?

I have some data that needs to be classified in spark streaming.我有一些数据需要在火花流中进行分类。 The classification key-values are loaded at the beginning of the program in a HashMap.分类键值在程序开始时加载到 HashMap 中。 Hence each incoming data packet needs to be compared against these keys and tagged accordingly.因此,每个传入的数据包都需要与这些密钥进行比较并相应地进行标记。

I realize that spark has variables called broadcast variables and accumalators to distribute objects.我意识到 spark 具有称为广播变量和累加器的变量来分发对象。 The examples in the tutorials are using simple variables like etc.教程中的示例使用简单的变量,如等。

How can I share my HashMap on all spark workers using a HashMap.如何使用 HashMap 在所有 spark 工作人员上共享我的 HashMap。 Alternatively, is there a better way to do this?或者,有没有更好的方法来做到这一点?

I am coding my spark streaming application in Java.我正在用 Java 编写我的 Spark 流应用程序。

In spark you can broadcast any serializable object the same way.在 spark 中,您可以以相同的方式广播任何可序列化的对象。 This is the best way because you are shipping data only once to the worker and then you can use it in any of the tasks.这是最好的方法,因为您只将数据传送给工作人员一次,然后您就可以在任何任务中使用它。

Scala:斯卡拉:

val br = ssc.sparkContext.broadcast(Map(1 -> 2))

Java:爪哇:

Broadcast<HashMap<String, String>> br = ssc.sparkContext().broadcast(new HashMap<>());

Here is a better example of how you would broadcast a HashMap in Java:这是一个更好的示例,说明如何在 Java 中广播 HashMap:

In your Spark applcation, you will create or load a HashMap.在您的 Spark 应用程序中,您将创建或加载一个 HashMap。 Then use Sparksession to broadcast that HashMap.然后使用 Sparksession 广播该 HashMap。

HashMap<String,String> bcMap = new HashMap();
bcMap.put("key1","val1");
bcMap.put("key2","val2");

Broadcast<HashMap> bcVar = this.sparkSession.sparkContext().broadcast(bncFlowConflg, classTag(HashMap.class));

And you would need the below class to create a classTag:你需要下面的类来创建一个 classTag:

private static <T> ClassTag<T> classTag(Class<T> clazz) {
    return scala.reflect.ClassManifestFactory.fromClass(clazz);
}

And you can refer to the broadcast within Spark functions such as map as below:您可以参考 Spark 函数中的广播,例如 map 如下:

HashMap<String,String> bcVal = bcVar .getValue();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM