简体   繁体   中英

Machine Learning - Classification or Clustering

I am new to machine learning and had a problem I wanted to solve and see if anyone has any ideas on what type of algorithm would be best to use. I am not looking for code, but rather a process.

Problem: I am classifying people into 2 categories: high risk and low risk. (this is a very basic starting point and I will expand as I learn how to classify more detailed)

Each person has 11 variables I am looking at and each variable has a binary value (0 for no, 1 for yes). The variables are like has married, gun_owner, home_owner, etc. So I gather each person can have 2^11 or 2048 different combinations of these variables.

I have a data set that has this information and then the result (whether or not they committed a crime). I figured this data would be used for training and then the algorithm can make predictions on high risk individuals.

Does anyone have any ideas for what would be the best algorithm? Since there are so many variables, I am having more trouble trying to figure out what may work bets.

This is a binary classification problem, with each input a binary string of length 11. There are many algorithms for this problem. The simplest one is the naive Bayes model ( https://en.wikipedia.org/wiki/Naive_Bayes_classifier ). You could also try some linear classifiers such as logistic regression or SVM. They both work well for linear separable data and binary classification.

It seems like you want to classify people based on a few features. It looks like a simple binary classification problem. However, it is not very clear that if the data you have is labeled or not.

So the first question is, in you dataset, do you know which person is 'high risk' and which person is 'low risk'? If you have that information, you can use a whole lot of machine learning model for this classification task.

However, if the labels are not present ('high risk' or 'low risk') you cannot do that. Then you have to think about some unsupervised learning methods (clustering). Hope this answers your question.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM