Have tried googling the issue but can't find anything useful.
Following tutorial in https://wiki.apache.org/nutch/NutchTutorial
Verified nutch with bin/nutch and it is fine
Have java 8 installed
java -version returns
java version "1.8.0_05"
Java(TM) SE Runtime Environment (build 1.8.0_05-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode)
And included in path uxing export
export JAVA_HOME="/cygdrive/c/program files/java/jre8"
export PATH="$JAVA_HOME/bin:$PATH"
Note using windows hence using cygwin64 as well.
Have added directory urls and added file seed.txt with one url
The ran
bin/nutch inject crawl/crawldb urls/seed.txt
and then gets the following error:
Injector: crawlDb: crawl/crawldb Injector: urlDir: urls/seed.txt Injector: Converting injected urls to crawl db entries. Injector: java.io.IOException: lock file crawl/crawldb/.locked already exists.
Hi There are two parts in this problem :
1 . There is already .locked file present in crawldb folder . Just delete the .locked file.
2 . Set the System environment variable Path for both %JAVA_HOME%\\bin
and %HADOOP_HOME%\\bin
then also set the User environment variable with %JAVA_HOME%
and %HADOOP_HOME%
without bin.
The error message is quite clear: another Nutch job holds a lock on the CrawlDb resp. it crashed or was killed before the lock file has been removed after the job has succeeded. Deleting the lock file crawl/crawldb/.locked
should solve the problem. But it's also good practice to look into log files (esp. the hadoop.log) to find out the reason why the lock file hasn't been removed.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.