简体   繁体   English

如何将两个不同Spout的输出发送到同一个Bolt?

[英]How to send output of two different Spout to the same Bolt?

I have two Kafka Spouts whose values I want to send to the same bolt. 我有两个Kafka Spout,其值我想发送到同一个螺栓。

Is it possible ? 可能吗 ?

Yes it is possible: 对的,这是可能的:

TopologyBuilder b = new TopologyBuilder();
b.setSpout("topic_1", new KafkaSpout(...));
b.setSpout("topic_2", new KafkaSpout(...));
b.setBolt("bolt", new MyBolt(...)).shuffleGrouping("topic_1").shuffleGrouping("topic_2");

You can use any other grouping, too. 您也可以使用任何其他分组。

Update: 更新:

In order to distinguish tuples (ie, topic_1 or topic_2) in consumer bolt, there are two possibilities: 为了区分消费者螺栓中的元组(即topic_1或topic_2),有两种可能性:

1) You can use operator IDs (as suggested by @user-4870385): 1)您可以使用运营商ID(由@ user-4870385建议):

if(input.getSourceComponent().equalsIgnoreCase("topic_1")) {
    //do something
} else {
    //do something
}

2) You can use stream names (as suggested by @zenbeni). 2)您可以使用流名称(由@zenbeni建议)。 For this case, both spouts need to declare named streams and the bolt need to connect to spouts by stream names: 对于这种情况,两个spouts都需要声明命名流,并且bolt需要通过流名称连接到spouts:

public class MyKafkaSpout extends KafkaSpout {
  final String streamName;

  public MyKafkaSpout(String stream) {
    this.streamName = stream;
  }

  // other stuff omitted

  @Override
  public void declareOutputFields(OutputFieldsDeclarer declarer) {
    // compare KafkaSpout.declareOutputFields(...)
    declarer.declare(streamName, _spoutConfig.scheme.getOutputFields());
  }
}

Build the topology, stream names need to be used now: 构建拓扑,现在需要使用流名称:

TopologyBuilder b = new TopologyBuilder();
b.setSpout("topic_1", new MyKafkaSpout("stream_t1"));
b.setSpout("topic_2", new MyKafkaSpout("stream_t2"));
b.setBolt("bolt", new MyBolt(...)).shuffleGrouping("topic_1", "stream_t1").shuffleGrouping("topic_2", "stream_t2");

In MyBolt the stream name can now be used to distinguish input tuples: MyBolt ,流名称现在可用于区分输入元组:

// in my MyBolt.execute():
if(input.getSourceStreamId().equals("Topic1")) {
  // do something
} else {
  // do something
}

Discussion: 讨论:

While the second approach using stream names is more natural (according to @zenbeni), the first is more flexible (IHMO). 虽然使用流名称的第二种方法更自然(根据@zenbeni),第一种方法更灵活(IHMO)。 Stream names are declared by spout/bolt directly (ie, at the time the spout/bolt code is written); 流名称直接由spout / bolt声明(即,在写入spout / bolt代码时); in contrast, operator IDs are assigned when topology is put together (ie, at the time the spout/bolt is used ). 与此相反,当拓扑放在一起(即,在喷口/螺栓使用的时间)操作者ID分配。

Let's assume we get three bolts as class files (no source code). 让我们假设我们得到三个螺栓作为类文件(没有源代码)。 The first two should be used as producers and both declare output streams with the same name. 前两个应该用作生成器,并且都声明具有相同名称的输出流。 If the third consumer distinguishes input tuples by stream, this will not work. 如果第三个消费者通过流来区分输入元组,则这将不起作用。 Even if both given producer bolts declare different output stream names, the expected input stream names might be hard coded in consumer bolt and might not match. 即使两个给定的生成器螺栓都声明了不同的输出流名称,预期的输入流名称也可能在消费者螺栓中进行硬编码,并且可能不匹配。 Thus, it does not work either. 因此,它也不起作用。 However, if the consumer bolt uses component names (even if they are hard coded) to distinguish incoming tuples, the expected component IDs can be assigned correctly. 但是,如果使用者bolt使用组件名称(即使它们是硬编码的)来区分传入的元组,则可以正确分配预期的组件ID。

Of course, it would be possible to inherit from the given classes (if not declared final and overwrite declareOutputFields(...) in order to assign own stream names. However, this is more additional work to do. 当然,可以从给定的类继承(如果没有声明final和覆盖declareOutputFields(...)以便分配自己的流名称。但是,这是更多的额外工作要做。

Yes its possible. 是的可能。 You can have any spout talking to same bolt. 你可以让任何喷嘴与同一个螺栓对话。 Refer https://storm.apache.org/documentation/Tutorial.html "Streams" section. 请参阅https://storm.apache.org/documentation/Tutorial.html“Streams ”部分。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM