简体   繁体   中英

Integrating Apache Nutch with MySQL on Windows

I am trying to integrate Apache Nutch 2.1 with Mysql server on Windows 8 platform. I am following tutorial http://nlp.solutions.asia/?p=180 . I have made following changes to the apache-nutch-2.1.

  1. I downloaded apache-nutch-2.1-src.zip and extracted.
  2. Uncommented following in ivy/ivy.xml

      <dependency org="mysql" name="mysql-connector-java" rev="5.1.18" conf="*->default"/> 
  3. commented sql properties for and added gora properties for mysql conf/gora.properties.

     gora.sqlstore.jdbc.driver=com.mysql.jdbc.Driver gora.sqlstore.jdbc.url=jdbc:mysql://localhost:3306/nutch? createDatabaseIfNotExist=true gora.sqlstore.jdbc.user=root gora.sqlstore.jdbc.password=root 
  4. Added properties to conf/nutch-site.xml
  5. executed ant runtime command from command prompt. It created /runtime directory.
  6. Added seeds.txt file inside /runtime/local/urls directory with www.apache.nutch.org value.
  7. added +^http://([a-z0-9]*.)*nutch.org/ to both domain-urlfilter.txt and regex-urlfilter.txt files inside /runtime/local/conf directory.

When I am running command for start crawling through cygwin terminal..following exception is occurring,

   Exception in thread "main" java.io.IOException: Failed to set permissions of path: \tmp\hadoop-Abhijeet\mapred\staging\Abhijeet530509219\.staging to 0700
    at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689)
    at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:662)
    at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
    at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
    at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
    at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
    at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:50)
    at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:219)
    at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
    at org.apache.nutch.crawl.Crawler.run(Crawler.java:136)
    at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)

I have searched over internet that Hadoop does not work with Windows which is alright as I my not using Hadoop for storing data. I am using Mysql.

Can anybody suggest What am i doing wrong ?

I have using Nutch2 on both windows and Linux. Just to run it on Windows you need this Haddop 1.0.3 patch installed: https://github.com/congainc/patch-hadoop_7682-1.0.x-win .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM