简体   繁体   中英

Apache nutch is not crawling any more

I have a two machine cluster. On one machine nutch is configured and on second hbase and hadoop are configured. hadoop is in fully distributed mode and hbase in pseudo distributed mode. I have crawled about 280GB data. But now when I start crawling . It gives following message and do not crawl any more in previous table

INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule

and following bug

ERROR store.HBaseStore - [Ljava.lang.StackTraceElement;@7ae0c96b

Documents are fetched but they are not saved in hbase. But if I crawl data in a new table, it works well and crawl properly witout any error. I think this is not a connection problem as for new table it works. I think it is bacause of some property etc.

Can anyone guide me as I am not an expert in apache nutch?

这不是我的专业领域,但看起来像底层计算机上的线程耗尽。

As I was also facing similiar problem. Actual problem was with regionserver (Hbase deamon ). So try to restart it as it is shutdown when used with default seeting and data is too mutch in hbase. For more information, see log files of regionserver.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM