简体   繁体   中英

error running mahout 20newsgroups in hadoop single node cluster

I have a configured a hadoop 1.2.1 single node cluster and installed mahout 0.8.

The node seems to be working correctly.

I'm trying to run the 20newsgroups mahout example on the hadoop cluster running the cnaivebayes classifier. The problem is that I'm getting the following error:

13/11/12 18:31:46 INFO common.AbstractJob: Command line arguments: {--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647], --fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter], --input=[/tmp/mahout-work-hduser/20news-all], --keyPrefix=[], --method=[mapreduce], --output=[/tmp/mahout-work-hduser/20news-seq], --overwrite=null, --startPhase=[0], --tempDir=[temp]}
Exception in thread "main" java.io.FileNotFoundException: File does not exist: /tmp/mahout-work-hduser/20news-all
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:558)
    at org.apache.mahout.text.SequenceFilesFromDirectory.runMapReduce(SequenceFilesFromDirectory.java:140)
    at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:89)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:63)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:194)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:160)

When i check the permissions of the folder I get this:

hduser@fernandoPC:/usr/local/mahout/core/target$ ls -l /tmp/mahout-work-hduser/
total 14136
drwxr-xr-x 22 hduser hadoop     4096 Nov 12 18:31 20news-all
drwxr-xr-x  4 hduser hadoop     4096 Nov 12 18:09 20news-bydate
-rw-r--r--  1 hduser hadoop 14464277 Nov 12 18:09 20news-bydate.tar.gz

When I run the 20newsgroups choosing sgd classifier, it works correctly. I think it's because it does not use map/reduce tasks so it is not even running on hadoop.

I looked around in google and couldn't find any solution.

Does anyone have any ideia?

This could be related to a Bug (see MAHOUT-1319) in seqdirectory wherein 'seqdirectory' ignores the 'PrefixFilter' argument. While this should be fixed in Mahout 0.9, could u try modifying the following in classify-20newsgroups.sh

     echo "Creating sequence files from 20newsgroups data"
  ./bin/mahout seqdirectory \
    -i ${WORK_DIR}/20news-all \
    -o ${WORK_DIR}/20news-seq -ow

to read as

   echo "Creating sequence files from 20newsgroups data"
  ./bin/mahout seqdirectory \
    -i ${WORK_DIR}/20news-all \
    -o ${WORK_DIR}/20news-seq -ow -xm sequential

Please give that a try.

Source: mahout-user mailing list archives

If you are running Mahout in standalone mode, then you want to set the environment variable MAHOUT_LOCAL. You can add a line

MAHOUT_LOCAL=1

to your ~/.bashrc file, execute: source .bashrc

This should fix the problem.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM