简体   繁体   中英

ArrayOutofBoundsException in kmeans, while running on hadoop

I am trying to run KMeans algorithm on hadoop using eclipse. I referred to this procedure.

http://www.slideshare.net/titusdamaiyanti/hadoop-installation-k-means-clustering-mapreduce?qid=44b5881c-089d-474b-b01d-c35a2f91cc67&v=qf1&b=&from_search=1#likes-panel

for this, the data is hardcoded. no need of external data file. When i run this program, I am getting ArrayOutOfBoundsException in DistanceMeasurer method. I am not getting why this error is coming. here is the code for Distance Measurer

package com.clustering.model;
public class DistanceMeasurer{
public static final double measureDistance(ClusterCenter center,Vector v){
double sum=0;
int length=v.getVector().length;
for(int i=0; i<length; i++){
sum+=Math.abs(center.getCenter().getVector()[i]-v.getVector()[i]);
}
return sum;
}
}

And, the console output in eclipse is like this,

15/03/18 12:26:15 INFO input.FileInputFormat: Total input paths to process : 1

15/03/18 12:26:16 INFO mapred.JobClient: Running job: job_local1627424039_0001

15/03/18 12:26:16 INFO mapred.LocalJobRunner: Waiting for map tasks
15/03/18 12:26:16 INFO mapred.LocalJobRunner: Starting task: attempt_local1627424039_0001_m_000000_0

15/03/18 12:26:16 INFO util.ProcessTree: setsid exited with exit code 0
15/03/18 12:26:16 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@a0e0e1
15/03/18 12:26:16 INFO mapred.MapTask: Processing split: file:/home/hduser/workspace/KMeansClustering/files/clustering/import/data:0+558
15/03/18 12:26:16 INFO mapred.MapTask: io.sort.mb = 100
15/03/18 12:26:16 INFO mapred.MapTask: data buffer = 79691776/99614720
15/03/18 12:26:16 INFO mapred.MapTask: record buffer = 262144/327680
15/03/18 12:26:17 INFO compress.CodecPool: Got brand-new decompressor
15/03/18 12:26:17 INFO mapred.JobClient:  map 0% reduce 0%

15/03/18 12:26:17 INFO compress.CodecPool: Got brand-new decompressor
15/03/18 12:26:17 INFO mapred.MapTask: Starting flush of map output
15/03/18 12:26:17 INFO mapred.LocalJobRunner: Map task executor complete.
15/03/18 12:26:17 WARN mapred.LocalJobRunner: job_local1627424039_0001

java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 1
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
    at com.clustering.model.DistanceMeasurer.measureDistance(DistanceMeasurer.java:9)
    at com.clustering.mapreduce.KMeansMapper.map(KMeansMapper.java:56)
    at com.clustering.mapreduce.KMeansMapper.map(KMeansMapper.java:1)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)


at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at 

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
15/03/18 12:26:18 INFO mapred.JobClient: Job complete: job_local1627424039_0001
15/03/18 12:26:18 INFO mapred.JobClient: Counters: 0

Please help me to resolve this. Thanks

Well, are you sure that 'center' has the same number of dimensions as 'vector'? Why don't you print out the length of 'center' before the loop?

Also, an aside, why are using an L1 distance?

your loop condition is wrong it should check the length for both arrays in vector. you can put both arrays length condition or you can change as per your requirement.

int length=v.getVector().length;
for(int i=0; i<length && i< center.getCenter().getVector().length; i++){
sum+=Math.abs(center.getCenter().getVector()[i]-v.getVector()[i]);
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM