简体   繁体   English

如何自定义 StormCrawler?

[英]How to customize StormCrawler?

Although I have been working with Apache Storm for a while, I am fairly new to StormCrawler.虽然我已经使用 Apache Storm 有一段时间了,但我对 StormCrawler 还是很陌生。

I started a project from the StormCrawler+ES archetype.我从 StormCrawler+ES 原型开始了一个项目。 However, to customize StormCrawler, at what specific point should additional bolts be added?但是,要自定义 StormCrawler,应该在什么特定点添加额外的螺栓?

Dave.戴夫。

One way of doing would be to write a custom bolt and add it between the fetcher and the parser.一种方法是编写自定义螺栓并将其添加到提取器和解析器之间。 It should look at the metadata for any mimetype given in the http response (bearing in mind the prefix used to store the info from the protocol ), possibly detecting the mimetype as done in the JSOUPParser .它应该查看 http 响应中给出的任何 mimetype 的元数据(记住用于存储来自协议的信息的前缀),可能会像在JSOUPParser中那样检测 mimetype。 If it is an image then do your specific processing for it, then emit to the output.如果是图像,则对其进行特定处理,然后发送到 output。 If it isn't, emit to a custom stream;如果不是,则发送到自定义 stream; the latter would be connected to the JSOUP parser so that you get outlinks;后者将连接到 JSOUP 解析器,以便您获得外链; the former goes into ES.前者进入ES。

You can find examples of dealing with the non-defaults streams in various places, in particular, the Tika module .您可以在不同的地方找到处理非默认流的示例,特别是Tika 模块

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM