[英]Which jBoss and EJB3 features should I use for Web Crawler
Happy New Year everybody,now I am trying to develop my own bot( web crawler ) that will walk around through Internet, for search engine. 大家新年快乐,现在我正在尝试开发自己的机器人( 网络爬虫 ),该机器人将通过Internet遍历搜索引擎。 I am thinking to use jboss scheduler-service to schedule bot and something like this to get content:
我正在考虑使用jboss scheduler-service来安排bot和类似的东西来获取内容:
URL u = new URL("http://www.google.kz");
InputStream in = u.openStream();
I want to ask which EJB3 or jBoss features should I use to develop effectively(in right way) my bot? 我想问一下我应该使用哪些EJB3或jBoss功能来有效地(正确地)开发我的机器人? I am new to EJB3 and jBoss.
我是EJB3和jBoss的新手。
If you have better ideas, you could right here.I am developing search engine to practice my Java skills and in academical issues, I am not going to compete with Google :) 如果你有更好的想法,你可以在这里。我正在开发搜索引擎来练习我的Java技能和学术问题,我不打算与谷歌竞争:)
PS I didn't decide yet how I will parse html, I am thinking about something like this Parse HTML . PS:我还没有决定如何解析html,我正在考虑类似“ 解析HTML”的内容 。 What can you recommend?
你能推荐什么?
You don't need EJB or JBoss at all. 您根本不需要EJB或JBoss。 In fact I can hardly think of a use of them for a web-crawler.
实际上,我几乎无法想到将它们用于网络爬虫。 Only perhaps if you are using JPA to store the results from the crawl, then you can make use of container-managed transactions, and the automatic injection of the JPA entity manager.
只有当您使用JPA存储爬网结果时,您才可以使用容器管理的事务,并自动注入JPA实体管理器。 Apart from that - no.
除此之外 - 没有。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.