简体   繁体   中英

unable to upload CSV file for WEKA analysis - java

I am working on a big data analysis project and i am stuck at this point I am trying to upload a CSV file with data and want to use WEKA java API to perform the analysis. I am looking to tokenize the text, remove stop words, identify pos and filter the nouns I have no idea why I am seeing this error. Explanation and Solution for this would be great ! But i see the below error

Error: 

   Exception in thread "main" java.io.IOException: wrong number of values. Read 21, expected 20, read Token[EOL], line 3
     at weka.core.converters.ConverterUtils.errms(ConverterUtils.java:912)
     at weka.core.converters.CSVLoader.getInstance(CSVLoader.java:819)
     at weka.core.converters.CSVLoader.getDataSet(CSVLoader.java:642)

Code :

CSVLoader loader = new CSVLoader();
loader.setSource(new File("C:\\fakepath\\CSVfilesample.csv"));
Instances data = loader.getDataSet();

// save ARFF
ArffSaver saver = new ArffSaver();
saver.setInstances(data);
saver.setFile(new File("C:\\fakepath\\CSVfilesample.arff"));
saver.setDestination(new File("C:\\fakepath\\CSVfilesample.arff"));
saver.writeBatch();

BufferedReader br=null;
br=new BufferedReader(new FileReader("C:\\fakepath\\CSVfilesample.arff"));
Instances train=new Instances(br);
train.setClassIndex(train.numAttributes()-1);
br.close();
NaiveBayes nb=new NaiveBayes();
nb.buildClassifier(train);
Evaluation eval=new Evaluation(train);
eval.crossValidateModel(nb, train, 10, new Random(1));
System.out.println(eval.toSummaryString("\nResults\n=====\n",true));
System.out.println(eval.fMeasure(1)+" "+eval.precision(1)+" "+eval.recall(1));

This error is generally caused by incorrect format while loading a certain ARFF file. There a few reasons. Check the following points:

  • It is practice to use ARFF format instead of a CSV because it has certain advantages over a CSV file. Check Can I use CSV.?
  • Now for the other part, check if the encoding of the file is UTF-8. If it is you will have to decode the file using UTF 8 format. Refernces : Text Categorization with WEKA
  • Thirdly check if there are some incompatible characters in your CSV. Like a %2 or something like that. Check for syntactically incorrect endings. Check for any extra commas.

This error tells you that there is problem with the file contents. They don't follow WEKA standard format. Fix that and the error will disappear.

Hope it helps. :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM