简体   繁体   中英

Machine learning algorithm to classify only positive and unlabeled data

I am trying to classify text with only positive features and unlabeled data. I just want the algorithm to identify the positive data and want to mark everything else as negative. What would be a good machine learning algorithm to classify such data? I tried using different algorithms in Weka but almost all classifiers give a lot of false positives.

If you believe that the unlabelled data is mostly negatives, then probably the best thing to do is to label all unlabelled data as "negative" and run your classifier of choice. Note that if you get an unlabelled testing data point predicted to be positive, this does not mean the answer is wrong. Some of your unlabelled data could be positive. So it's hard to judge how well your classifier is doing in your setting. If you believe that your unlabelled data might be biased toward the positives then you're probably better off using so-called "one-class classifiers" on the positive data, there are popular examples including one-class SVM.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM