简体   繁体   中英

Error: org/jdom/JDOMException while running map reduce job

Hi I am trying to parse an XML file using map reduce framework. I am using JDOM Parser for parsing of the XML file. but when I run my map reduce code on a pseudo-node cluster than it gives me following error.

WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications  
                       should implement Tool for the same.
INFO input.FileInputFormat: Total input paths to process : 1
INFO util.NativeCodeLoader: Loaded the native-hadoop library
WARN snappy.LoadSnappy: Snappy native library not loaded
INFO mapred.JobClient: Running job: job_201303281220_0016
INFO mapred.JobClient: map 0% reduce 0%
INFO mapred.JobClient: Task Id : attempt_201303281220_0016_m_000000_0, Status : FAILED
Error: org/jdom/JDOMException
INFO mapred.JobClient: Task Id : attempt_201303281220_0016_m_000000_1, Status : FAILED
Error: org/jdom/JDOMException
INFO mapred.JobClient: Task Id : attempt_201303281220_0016_m_000000_2, Status : FAILED
Error: org/jdom/JDOMException
INFO mapred.JobClient: Job complete: job_201303281220_0016
INFO mapred.JobClient: Counters: 7
INFO mapred.JobClient: Job Counters
INFO mapred.JobClient: SLOTS_MILLIS_MAPS=7541
INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots  
                       (ms)=0
INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots 
                       (ms)=0
INFO mapred.JobClient: Launched map tasks=4
INFO mapred.JobClient: Data-local map tasks=4
INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
INFO mapred.JobClient: Failed map tasks=1

I tried downloading JDOM 1.x jars but still no help getting the same error. If someone can suggest something that will be a great help.

NOTE: I am able to run various examples like word-count,PI so I think my cluster is establish properly.

Thanks in advance.

You need to confirm and ensure that your input file has one XML document per line (eg no line feeds in your XML)? It's probable that the map() method is being handed single lines (You're using FileInputFormat) but with embedded line feeds those line contain only partial XML documents.

For example, if your file looks like this:

<root
    arg1=""
    arg2="">

</root>

Then the map() method will be called once for each of the five lines. None of the lines contain a valid XML document. A DOM parsing error would be thrown 5 times, even though your file really does contain valid XML

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM