简体   繁体   中英

indexing Nutch with solr

I'm very new to Nutch and solr, I have to download the content from pdf from specific url, Am getting error in Nutch can any one help me with this., Thanks in Advance..

$ bin/nutch generate crawl/crawldb crawl/segments Generator: starting at 2018-10-16 11:28:09 Generator: Selecting best-scoring urls due for fetch. Generator: filtering: true Generator: normalizing: true Generator: running in local mode, generating exactly one partition. Generator job did not succeed, job status:FAILED, reason: NA Generator: java.lang.RuntimeException: Generator job did not succeed, job status:FAILED, reason: NA at org.apache.nutch.crawl.Generator.generate(Generator.java:802) at org.apache.nutch.crawl.Generator.run(Generator.java:1008) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)`` at org.apache.nutch.crawl.Generator.main(Generator.java:957)

Based on your log file, your nutch-site.xml is not a valid XML document, and index-writers.xml is not properly configured.

I suggest looking at the log file, reading the documentation at https://wiki.apache.org/nutch/IndexWriters , fixing, rerunning, and checking the log again.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM