简体繁体中英

Crawl Web Data using Web Crawler

原文 2011-03-30 06:05:02 4 2 java/ web-crawler

I would like to use a web crawler and crawl a particular website. The website is a learning management system where many student upload their assignments,project presentations and so on. My question is that can i use a web crawler and download the files that have been uploaded in the learning management system. After i download them i would like to create an index on them so as to query the set of documents. User can use my application as a search engine. Can a crawler does this? I know about webeater ( Crawler written in Java )

2 answers

Download the files in Java SingleThread.
Parse the files (you can get idea from parse plugins of nutch).
Create index with lucene

If you want to use a real webcrawler, user http://www.httrack.com/

It offers you so many options for copying websites or content on webpages including flash. It works on windows and mac.

Then you can do steps 2 and 3 as suggested above.

Web Crawler using jsoup

Using Web crawler for price comparison

How to web crawl on Google

Crawl a list of sites using Crawler4j

Multithreaded Web Crawler in Java

Parsing HTML in web crawler

exception in web crawler with selenium

Simple web crawler on android?

Java Web crawler and scraper

Web crawler encounter javascript

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Web Crawler using jsoup Using Web crawler for price comparison How to web crawl on Google Crawl a list of sites using Crawler4j Multithreaded Web Crawler in Java Parsing HTML in web crawler exception in web crawler with selenium Simple web crawler on android? Java Web crawler and scraper Web crawler encounter javascript

Related Tags

Crawl Web Data using Web Crawler

Question

2 answers

solution1
0 ACCPTED 2011-03-30 07:48:20

solution2
0 2011-03-30 08:25:59

Crawl Web Data using Web Crawler

Question

2 answers

solution1 0 ACCPTED 2011-03-30 07:48:20

solution2 0 2011-03-30 08:25:59

solution1
0 ACCPTED 2011-03-30 07:48:20

solution2
0 2011-03-30 08:25:59