簡體 English 中英

如何使用scikit-learn預測具有分類和連續特征的二進制結果？

[英]how to predict binary outcome with categorical and continuous features using scikit-learn?

原文 2016-07-29 14:44:28 7 2 python/ r/ machine-learning

我需要為分類問題選擇模型和機器學習算法的建議。

我試圖預測一個對象的二進制結果。 我的數據集中有500,000條記錄，還有20個連續和分類特征。 每個主題都有10--20條記錄。 數據標有其結果。

到目前為止，我正在考慮基於此處的備忘單的邏輯回歸模型和核近似。

我不確定在R或Python中實現此功能時從何處開始。

謝謝！

2 個解決方案

在任何數據挖掘項目中，選擇算法和優化參數都是一項艱巨的任務。 因為它必須針對您的數據和問題進行定制。 嘗試使用不同的算法，例如SVM，隨機森林，邏輯回歸，KNN和...，並對每個算法進行交叉驗證，然后進行比較。 您可以在病態學習中使用GridSearch嘗試不同的參數並為每種算法優化參數。 也嘗試這個項目，用遺傳算法測試一系列參數

特征

如果分類功能沒有太多可能的不同值，則可能需要查看sklearn.preprocessing.OneHotEncoder 。

型號選擇

“最佳”模型的選擇主要取決於可用訓練數據的數量以及您期望獲得的決策邊界的簡單性。

您可以嘗試將尺寸降低到2或3維。 然后，您可以可視化數據並查看是否存在良好的決策邊界。

通過50萬個訓練示例，您可以考慮使用神經網絡。 我可以向初學者推薦Keras ，向那些了解神經網絡如何工作的人推薦TensorFlow 。

您還應該知道有Ensemble方法。

在您已經發現的sklearn教程中，有一個很好的備忘單：

_{（來源： scikit-learn.org ）}

只需嘗試一下，比較不同的結果。 沒有更多信息，就不可能給您更好的建議。

使用Scikit-Learn的SVR，如何將分類和連續特征結合起來預測目標？

[英]Using Scikit-Learn's SVR, how do you combine categorical and continuous features in predicting the target?

使用scikit-learn處理分類特征

[英]Handling categorical features using scikit-learn

scikit-learn使用什么距離函數來分類特征？

[英]What distance function is scikit-learn using for categorical features?

使用scikit-learn處理太多分類功能

[英]handling too many categorical features using scikit-learn

使用 scikit-learn 對分類特征進行特征選擇

[英]Feature selection using scikit-learn on categorical features

如果不是，可以使用scikit-learn而不是二進制分類來預測變量

[英]can one predict variable using scikit-learn rather binary classification if yes than how

使用scikit-Learn建立乘法分類模型

[英]Using scikit-Learn for a multiplicative, categorical model

如何在 scikit-learn 中預測時間序列？

[英]How to predict time series in scikit-learn?

scikit-learn，線性回歸中的分類（但數值）特征

[英]scikit-learn, categorical (but numerical) features in Linear Regression

如何從 scikit-learn predict_proba 中恢復輸入分類符號？

[英]How to recover input categorical symbols from scikit-learn predict_proba?

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 使用Scikit-Learn的SVR，如何將分類和連續特征結合起來預測目標？使用scikit-learn處理分類特征 scikit-learn使用什么距離函數來分類特征？使用scikit-learn處理太多分類功能使用 scikit-learn 對分類特征進行特征選擇如果不是，可以使用scikit-learn而不是二進制分類來預測變量使用scikit-Learn建立乘法分類模型如何在 scikit-learn 中預測時間序列？ scikit-learn，線性回歸中的分類（但數值）特征如何從 scikit-learn predict_proba 中恢復輸入分類符號？

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM