[英]Finding tendencies and patterns with SKLearn
所以,我有一個 dataframe 格式如下:
Gender Customer Type Age Type of Travel Class
0 Male Loyal Customer 13 Personal Travel Eco Plus
1 Male disloyal Customer 25 Business travel Business
2 Female Loyal Customer 26 Business travel Business
3 Female Loyal Customer 25 Business travel Business
4 Male Loyal Customer 61 Business travel Business
5 Male disloyal Customer 20 Business travel Eco
6 Female disloyal Customer 24 Business travel Eco
我想創建一個更願意成為忠實客戶的人的檔案,這樣我就可以產生洞察力來正確解決他們。 為此,我被推薦了圖書館 SKLearn,但我進行了研究並沒有找到一個好的方法,因為它是一個非常大的圖書館。 所以,如果有人在這個庫中有一些經驗,或者建議使用另一個庫,你能否指出我正確的方向,並解釋獲得該結果的最佳功能? 順便說一句,沒有必要成為一個完美的方法,我正在尋找一些簡單的代碼並且會給出一個一般的答案
最簡單的 model 將是邏輯回歸。 例如,只有年齡具有預測價值的代碼:
import numpy as np
import pandas as pd
from sklearn import linear_model
ages_loyal = np.random.normal(50, 100, 500)
ages_disloyal = np.random.normal(35, 10, 500)
ages_loyal = ages_loyal[(ages_loyal > 17) & (ages_loyal < 96)]
ages_disloyal = ages_disloyal[(ages_disloyal > 17) & (ages_disloyal < 96)]
df_loyal = pd.DataFrame({'Gender': np.random.choice(['Male', 'Female'], size=ages_loyal.shape[0], replace=True, p=None), 'Customer Type': ['Loyal Customer'] * ages_loyal.shape[0], 'Age': ages_loyal.astype(np.int64), 'Type of Travel': np.random.choice(['Personal Travel', 'Business travel'], size=ages_loyal.shape[0], replace=True, p=[0.09, 0.91]), 'Class': np.random.choice(['Eco Plus', 'Business', 'Eco'], size=ages_loyal.shape[0], replace=True, p=None)})
df_disloyal = pd.DataFrame({'Gender': np.random.choice(['Male', 'Female'], size=ages_disloyal.shape[0], replace=True, p=None), 'Customer Type': ['disloyal Customer'] * ages_disloyal.shape[0], 'Age': ages_disloyal.astype(np.int64), 'Type of Travel': np.random.choice(['Personal Travel', 'Business travel'], size=ages_disloyal.shape[0], replace=True, p=[0.09, 0.91]), 'Class': np.random.choice(['Eco Plus', 'Business', 'Eco'], size=ages_disloyal.shape[0], replace=True, p=None)})
df = pd.concat([df_loyal, df_disloyal])
model = linear_model.LogisticRegression()
model.fit(pd.get_dummies(df.iloc[:,df.columns != 'Customer Type']),df['Customer Type'])
predicted = model.predict(pd.get_dummies(df.iloc[:,df.columns != 'Customer Type']))
results = pd.DataFrame({'predicted': predicted, 'actual': df['Customer Type']})
pd.crosstab(results['predicted'], results['actual'])
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.