简体   繁体   English

Nutch抓取无效

[英]Nutch Crawl does not working

I want to crawl a site using Apache Nutch 1.12 and index the data into Apache Solr. 我想使用Apache Nutch 1.12抓取一个站点并将数据索引到Apache Solr中。 I have followed this tutorial . 我遵循了本教程

My seed.txt file has this url http://nutch.apache.org/ 我的seed.txt文件的网址为http://nutch.apache.org/

In my regex url filter I am having like this +^ http://([a-z0-9] *.)*nutch.apache.org/ 在我的正则表达式URL过滤器中,我有这样的+ ^ http://([a-z0-9] *。)* nutch.apache.org/

when I try to fetch the data i am getting only the url in my seed.txt file. 当我尝试获取数据时,我在seed.txt文件中仅获得了url。

Fetcher: starting at 2017-01-03 09:56:23
Fetcher: segment: crawl/segments/20170103095613
Fetcher: threads: 10
Fetcher: time-out divisor: 2
QueueFeeder finished: total 2 records + hit by time limit :0
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
fetching http://nutch.apache.org/ (queue crawl delay=5000ms)
Thread FetcherThread has no more work available
-finishing thread FetcherThread, activeThreads=2
Using queue mode : byHost
Using queue mode : byHost
Thread FetcherThread has no more work available
-finishing thread FetcherThread, activeThreads=2
Using queue mode : byHost
Thread FetcherThread has no more work available
-finishing thread FetcherThread, activeThreads=2
Using queue mode : byHost
Thread FetcherThread has no more work available
-finishing thread FetcherThread, activeThreads=2
Using queue mode : byHost
Thread FetcherThread has no more work available
-finishing thread FetcherThread, activeThreads=2
Using queue mode : byHost
Thread FetcherThread has no more work available
-finishing thread FetcherThread, activeThreads=2
Using queue mode : byHost
Thread FetcherThread has no more work available
-finishing thread FetcherThread, activeThreads=2
Fetcher: throughput threshold: -1
Fetcher: throughput threshold retries: 5
Thread FetcherThread has no more work available
-finishing thread FetcherThread, activeThreads=2
robots.txt whitelist not configured.
robots.txt whitelist not configured.
-activeThreads=2, spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=2
Thread FetcherThread has no more work available
Thread FetcherThread has no more work available
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=0
-activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=0
-activeThreads=0

What i am missing here. 我在这里想念的是什么。

我再次尝试执行获取操作,以获得预期的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM