简体   繁体   English

风暴字数拓扑-具有执行次数的概念问题

[英]Storm Word Count Topology - Concept issue with number of executions

Good afternoon, I am following the Storm-starter WordCountTopology here . 下午好,我在这里关注Storm-starter WordCountTopology。 For reference, here are the Java files. 供参考,以下是Java文件。

This is the main file: 这是主文件:

public class WordCountTopology {
public static class SplitSentence extends ShellBolt implements IRichBolt {

public SplitSentence() {
  super("python", "splitsentence.py");
}

@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
  declarer.declare(new Fields("word"));
}

@Override
public Map<String, Object> getComponentConfiguration() {
  return null;
}
}

public static class WordCount extends BaseBasicBolt {
Map<String, Integer> counts = new HashMap<String, Integer>();

@Override
public void execute(Tuple tuple, BasicOutputCollector collector) {
  String word = tuple.getString(0);
  Integer count = counts.get(word);
  if (count == null)
    count = 0;
  count++;
  counts.put(word, count);
  collector.emit(new Values(word, count));
}

@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
  declarer.declare(new Fields("word", "count"));
}
}

public static void main(String[] args) throws Exception {

TopologyBuilder builder = new TopologyBuilder();

builder.setSpout("spout", new TextFileSpout(), 5);

builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));

Config conf = new Config();
conf.setDebug(true);

if (args != null && args.length > 0) {
  conf.setNumWorkers(3);

  StormSubmitter.submitTopology(args[0], conf, builder.createTopology());
}
else {
  conf.setMaxTaskParallelism(3);
  LocalCluster cluster = new LocalCluster();
  cluster.submitTopology("word-count", conf, builder.createTopology());
  Thread.sleep(10000);
  cluster.shutdown();
}
}
}

Instead of reading from a random String[], I would like just one read from one sentence: 与其从一个随机的String []中读取,不如从一个句子中读取一个:

public class TextFileSpout extends BaseRichSpout {
    SpoutOutputCollector _collector;
    String sentence = "";
    String line = "";
    String splitBy = ",";
    BufferedReader br = null;

    @Override
    public void open(Map conf, TopologyContext context,
            SpoutOutputCollector collector) {
        _collector = collector;

    }

    @Override
    public void nextTuple() {
        Utils.sleep(100);
        sentence = "wordOne wordTwo";
        _collector.emit(new Values(sentence));
        System.out.println(sentence);
    }

    @Override
    public void ack(Object id) {
    }

    @Override
    public void fail(Object id) {
    }

    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("word"));
    }

}

This code runs and the output is a lot of threads/emits. 此代码运行,并且输出是很多线程/出口。 The problem is that the program executes repeatedly reads that one sentence 85 times instead of just once. 问题是程序执行重复读取该语句85次而不是一次。 I'm guessing this is because the original code executes multiple times new random sentences. 我猜这是因为原始代码会多次执行新的随机语句。

What is causing NextTuple to be called so many times? 是什么导致NextTuple被多次调用?

You should move the file initialize code with in open method , otherwise every single time the nextTuple is called your file handler will be initialized. 您应该使用open方法移动文件初始化代码,否则每次调用nextTuple的文件处理程序都会被初始化。

EDIT: 编辑:

inside open method , do something like 在内部打开方法中,执行类似

    br = new BufferedReader(new FileReader(csvFileToRead));

and then the logic to read file should be inside the nextTuple method 然后读取文件的逻辑应该在nextTuple方法中

     while ((line = br.readLine()) != null) {
         // your logic
     }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM