简体繁体中英

What is the role of NUTCH if we are going to make a search engine using Hadoop and Solr?

原文 2012-09-06 15:57:49 9 1 solr/ hadoop/ nutch

I want to make a search engine. In which i want to crawl some sites and stored their indexes and info in Hadoop. And then using Solr search will be done. But I am facing lots of issues. If search over google then different people give different suggestions and different configuring ways for setup a hadoop based search engine. These are my some questions :

1) How the crawling will be done? Is there any use of NUTCH for completing the crawling or not? If yes then how Hadoop and NUTCH communicate with each other?

2) What is the use of Solr? If NUTCH done Crawling and stored their crawled indexes and their information into the Hadoop then what's the role of Solr?

3) Can we done searching using Solr and Nutch? If yes then where they will saved their crawled indexes?

4) How Solr communicate with Hadoop?

5) Please explain me one by one steps if possible, that how can i crawl some sites and save their info into DB(Hadoop or any other) and then do search .

I am really really stuck with this. Any help will really appreciated.

A very big Thanks in advance. :)

Please help me to sort out my huge issue please

1 answers

We are using Nutch as a webcrawler and Solr for searching in some productive environments. So I hope I can give you some information about 3).

How does this work? Nutch has it's own crawling db and some websites where it starts crawling. It has some plugins where you can configure different things like pdf crawling, which fields will be extracted of html sites and so on. When crawling Nutch stores all links extracted from a website and will follow them in the next cycle. All crawling results will be stored in a crawl db. In Nutch you configure an intervall where crawled results will be outdated and the crawler begins from the defined startsites.

The results inside the crawl db will be synchronized to the solr index. So you are searching on the solr index. Nutch is in this constallation only to get data from websites and providing them for solr.

How do we create a simple search engine using Lucene, Solr or Nutch?

how to make a search engine with nutch and cassandra?

Nutch deployment on hadoop will not index to solr

Can we crawl and index Google Drive documents using nutch and solr?

Using Nutch crawler with Solr

Information on Nutch , Hadoop , Solr, MapReduce and Mahout

Confusion in Apache Nutch, HBase, Hadoop, Solr, Gora

How to search on databases in a hadoop cluster using Solr

Can we search for .txt files in Solr search engine?

Solr Indexing using Nutch Crawler

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How do we create a simple search engine using Lucene, Solr or Nutch? how to make a search engine with nutch and cassandra? Nutch deployment on hadoop will not index to solr Can we crawl and index Google Drive documents using nutch and solr? Using Nutch crawler with Solr Information on Nutch , Hadoop , Solr, MapReduce and Mahout Confusion in Apache Nutch, HBase, Hadoop, Solr, Gora How to search on databases in a hadoop cluster using Solr Can we search for .txt files in Solr search engine? Solr Indexing using Nutch Crawler

Related Tags

What is the role of NUTCH if we are going to make a search engine using Hadoop and Solr?

Question

1 answers

solution1 1 2012-11-30 14:54:00

solution1
1 2012-11-30 14:54:00