简体   繁体   中英

Which jBoss and EJB3 features should I use for Web Crawler

Happy New Year everybody,now I am trying to develop my own bot( web crawler ) that will walk around through Internet, for search engine. I am thinking to use jboss scheduler-service to schedule bot and something like this to get content:

URL u = new URL("http://www.google.kz");
  InputStream in = u.openStream();

I want to ask which EJB3 or jBoss features should I use to develop effectively(in right way) my bot? I am new to EJB3 and jBoss.

If you have better ideas, you could right here.I am developing search engine to practice my Java skills and in academical issues, I am not going to compete with Google :)

  • jboss-5.1.0.GA
  • XP
  • EJB3
  • Eclipse helios

PS I didn't decide yet how I will parse html, I am thinking about something like this Parse HTML . What can you recommend?

You don't need EJB or JBoss at all. In fact I can hardly think of a use of them for a web-crawler. Only perhaps if you are using JPA to store the results from the crawl, then you can make use of container-managed transactions, and the automatic injection of the JPA entity manager. Apart from that - no.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM