简体   繁体   中英

How to change attribute type in WEKA?

I am a new learner in WEKA. I use Car Evaluation dataset. First, I copied all attributes, instances and values correctly in Excel and save as csv file. I opened that csv file in WEKA. I can see all count of classes, attributes etc. However, I cannot see for doors and persons attributes. I am getting "Attribute is neither numeric nor nominal."

These attributes get values such as "2","3" and "more". They take both numeric and nominal values. In WEKA their types are string. How can I change attribute types or which method should I apply to see their visualization and counts?

WEKA can read a csv file, but the csv gives no information about the type of the attributes. That is why WEKA encourages you to use arff file format. arff format is the same as csv except that it has a header that describes the variables (and allows comments and other documentation). The header will contain things like

@attribute mpg numeric
@attribute cyl numeric
@attribute doors {2,3,more}

to indicate that mpg and cyl will have numeric values while doors will be a factor that can take on any of the three values "2","3", or "more". You will need to be sure that you specify all of the possible values for factors like doors. You can simply add the header in a text editor if you know what the header should look like. You can get more details on the arff format at This WEKA site or This University of Waikato site .

Perhaps you should decide for making the attribute all numeric, or all nominal (also known as categorical, or all strings).

Benefits of an all numeric attribute: algorithms can determine a mathematical relationship between that attribute and any other attribute, including the target (or desired output), eg, correlation, dependence/independence, covariance. Furthermore, if you use tree-based algorithms, nodes can define decision rules such as doors>3 or persons<2.

The benefit of having an all nominal attributes includes: algorithms can finish faster because of the limited number of things that can be done with categorical values. Cons: most algorithms do not directly support nominal attributes. Tree-based algorithms are limited in the type of decisions nodes they can produce, eg, doors is '3' or persons is not 'more'.

Caveat: if the attribute you are dealing with is the target or desired output, having it all numeric will make weka interpret it as a regression problem, while having that attribute as nominal will automatically be interpreted as a classification problem.

If you are interested in making your attribute all numeric, you could probably replace all occurrences more with, say, a -1 using excel.

If later down the road you need to go from all numeric to a nominal attribute, you could simply use a filter do to that . Or if you are using the java API you could check Walter's solution :

import weka.core.Instances;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.NumericToNominal;

public class Main {
  public static void main(String[] args) throws Exception {     
    //load training instances 
    Instances originalTrain= //...load data with numeric attributes 
        NumericToNominal convert= new NumericToNominal();
    String[] options= new String[2];
    options[0]="-R";
    options[1]="1-2";  //range of variables to make numeric

    convert.setOptions(options);
    convert.setInputFormat(originalTrain);

    Instances newData=Filter.useFilter(originalTrain, convert);

    System.out.println("Before");
    for(int i=0; i&#60;2; i=i+1) {
      System.out.println("Nominal? "+originalTrain.attribute(i).isNominal());
    }

    System.out.println("After");
    for(int i=0; i&#60;2; i=i+1) {
      System.out.println("Nominal? "+newData.attribute(i).isNominal());
    }   
  }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM