简体   繁体   中英

WEKA: issue with attribute scales

I've a training data sets and multiple test sets (I'm classifying instances in a clustering framework, so the instances of the test set are computed on fly).

The instances attributes have different scales (the first one varies from 0 to 1, and the second from 0 to 100).

How do my classifiers (logistic regression and SMO) deal with the fact they don't have the entire test set at once?

In other terms, how do they deal with different scale attributes if they don't know what the maximum value is in the test set?

thanks

According to the Weka Javadocs , SMO "normalizes all attributes by default. (Note that the coefficients in the output are based on the normalized/standardized data, not the original data.)" Ie, you'll get erroneous normalization if your training set doesn't cover the full range for each attribute. How bad that is depends on your data.

I suggest you try training both with and without normalization (use setFeatureSpaceNormalization(false) to turn it off) and see what works best.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM