简体   繁体   中英

Looking for ideas to lower the false positive rate in Machine Learning Classification

Is there a way to reduce the false positive rate in classic fraud prediction problem. Currently iam working on classic fraud detection. There are 50000 samples with true label(results were due to investigation). Those training labels are fairly balanced. Logisitic regression model that i choose is performing well with f1 score over 90 percent. Now when using the model to predict new cases results are 50/50(Fraud and non fraud). Is there a way to tune the model that lets to pass through non fraud cases and penalizes the false positive rate so that we detect less number of fraud cases(probably less than 200 out of one million) but they are highly likely to be fraud. Hope that clears.

Here are all the parameters that logistic regression model takes.

sklearn.linear_model.LogisticRegression(penalty='l2', *, dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver='lbfgs', max_iter=100, multi_class='auto', verbose=0, warm_start=False, n_jobs=None, l1_ratio=None)

Mostly the default ones work well so, if you have changed parameter then try using default ones. If you are already using default parameters and still getting poor result then you might want to change the parameters value according to your dataset. For that you need to know what all those parameter mean. If you don't know that then follow This link

So you want to bias the model towards predicting 'Not Fraud' more oftenly. Depends on the model you are using. If you want you are free to set a threshold on the output of your logistic regression model that allows only the instances for which the output is actually closer to 1 to be classified as 'Fraud'. This can be done for example in sklearn by accesing the output probabilities of your model using predict_log_proba(X) or predict_proba(X) (log probabilities or probabilities). (source: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression )

If your model is supposed to output 1 for 'Fraud', you may threshold the output using an if (if output > 0.8 then 'Fraud').

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM