[英]Installing Apache Nutch on Windows
我正在嘗試將Apache Solr與Windows 7(64位)上的Apache Nutch 1.14集成,但是在嘗試運行Nutch時出現錯誤。
我已經做過的事情:
(我嘗試了Hadoop WinUtils 2.7.1,也沒有成功)。
我得到的錯誤:
$ bin/crawl -i -D http://localhost:8983/solr/ -s urls/ TestCrawl 2
Injecting seed URLs
/home/apache-nutch-1.14/bin/nutch inject TestCrawl/crawldb urls/
Injector: starting at 2018-06-20 07:14:47
Injector: crawlDb: TestCrawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:609)
at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:977)
at org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:187)
at org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:174)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:108)
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.confChanged(LocalDirAllocator.java:285)
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:344)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:115)
at org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:125)
at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:163)
at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
at org.apache.nutch.crawl.Injector.inject(Injector.java:417)
at org.apache.nutch.crawl.Injector.run(Injector.java:563)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.Injector.main(Injector.java:528)
Error running:
/home/apache-nutch-1.14/bin/nutch inject TestCrawl/crawldb urls/
Failed with exit value 1.
從http://www.java2s.com/Code/Jar/h/Downloadhadoopcore121jar.htm下載hadoop-core-1.1.2.jar文件並將其粘貼到NUTCH_HOME / lib文件夾后,我得到了以下錯誤:
$ bin/crawl -i -D http://localhost:8983/solr/ -s urls/ TestCrawl 2
Injecting seed URLs
/home/apache-nutch-1.14/bin/nutch inject TestCrawl/crawldb urls/
Injector: starting at 2018-06-20 23:19:49
Injector: crawlDb: TestCrawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapreduce.Job.getInstance(Lorg/apache/hadoop/conf/Configuration;Ljava/lang/String;)Lorg/apache/hadoop/mapreduce/Job;
at org.apache.nutch.crawl.Injector.inject(Injector.java:401)
at org.apache.nutch.crawl.Injector.run(Injector.java:563)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.Injector.main(Injector.java:528)
Error running:
/home/apache-nutch-1.14/bin/nutch inject TestCrawl/crawldb urls/
Failed with exit value 1.
如果我沒有設置HADOOP_HOME變量,則會收到以下異常:
Injector: java.io.IOException: (null) entry in command string: null chmod 0644 C:\cygwin64\home\apache-nutch-1.14\TestCrawl\crawldb\.locked
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:869)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:852)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:733)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:225)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:209)
at org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:307)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:296)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:328)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:398)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:461)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:854)
at org.apache.hadoop.fs.FileSystem.createNewFile(FileSystem.java:1154)
at org.apache.nutch.util.LockUtil.createLockFile(LockUtil.java:59)
at org.apache.nutch.util.LockUtil.createLockFile(LockUtil.java:81)
at org.apache.nutch.crawl.CrawlDb.lock(CrawlDb.java:178)
at org.apache.nutch.crawl.Injector.inject(Injector.java:398)
at org.apache.nutch.crawl.Injector.run(Injector.java:563)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.Injector.main(Injector.java:528)
Error running:
/home/apache-nutch-1.14/bin/nutch inject TestCrawl//crawldb urls/
Failed with exit value 127.
我會很感激我能得到的任何幫助!
當您執行抓取時,只需執行以下命令
bin/crawl -s urls/ TestCrawl/ 2
然后您可以使用它(-D與類)
bin/nutch index -Dsolr.server.url=http://localhost:8983/solr/YOURCORE TestCrawl/crawldb/ -linkdb TestCrawl/linkdb/ TestCrawl/segments/* -filter -normalize -deleteGone
或者您可以在conf / nutch-site.xml中指定
<property>
<name>solr.server.url</name>
<value>http://localhost:8983/solr/YOURCORE/</value>
<description>Defines the Solr URL into which data should be indexed using the indexer-solr plugin.</description>
</property>
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.