简体   繁体   中英

Python: How to find the most similar label for a given feature vector?

I am looking for a Machine learning approach to find the most likely class lable (with the probability value) for a given feature vector. I have a training set for n classes and most of the feature vector consist of boolean values. Till now I was thinking of counting the number of True values for features and normalizing ( for eg m= number of training samples with value True for a feature and n =number of training samples. feat_val=m/n) it to create a representational feature vector for a class. Once created, similarity measures like cosine distance or eucledian distance between the class representation vector and the given feature vector. Can anyone suggest whether this approach will be worth implementing?

The problem you are trying to solve is called classification and is a major part of supervised learning. Great place to start is an open source library called scikit-learn and their documentation (try this ).

There are a lot of classification models available but once you pick a specific one and train it then you simply use the predict_proba method to get probabilities for a given feature vector or matrix.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM