簡體   English   中英

使用 SKLearn 尋找趨勢和模式

[英]Finding tendencies and patterns with SKLearn

所以,我有一個 dataframe 格式如下:

   Gender      Customer Type  Age   Type of Travel  Class
0   Male    Loyal Customer    13    Personal Travel Eco Plus
1   Male    disloyal Customer 25    Business travel Business
2   Female  Loyal Customer    26    Business travel Business
3   Female  Loyal Customer    25    Business travel Business
4   Male    Loyal Customer    61    Business travel Business
5   Male    disloyal Customer 20    Business travel Eco
6 Female    disloyal Customer 24    Business travel Eco

我想創建一個更願意成為忠實客戶的人的檔案,這樣我就可以產生洞察力來正確解決他們。 為此,我被推薦了圖書館 SKLearn,但我進行了研究並沒有找到一個好的方法,因為它是一個非常大的圖書館。 所以,如果有人在這個庫中有一些經驗,或者建議使用另一個庫,你能否指出我正確的方向,並解釋獲得該結果的最佳功能? 順便說一句,沒有必要成為一個完美的方法,我正在尋找一些簡單的代碼並且會給出一個一般的答案

最簡單的 model 將是邏輯回歸。 例如,只有年齡具有預測價值的代碼:

import numpy as np
import pandas as pd
from sklearn import linear_model

ages_loyal = np.random.normal(50, 100, 500)
ages_disloyal = np.random.normal(35, 10, 500)
ages_loyal = ages_loyal[(ages_loyal > 17) & (ages_loyal < 96)]
ages_disloyal = ages_disloyal[(ages_disloyal > 17) & (ages_disloyal < 96)]

df_loyal = pd.DataFrame({'Gender': np.random.choice(['Male', 'Female'], size=ages_loyal.shape[0], replace=True, p=None), 'Customer Type': ['Loyal Customer'] * ages_loyal.shape[0], 'Age': ages_loyal.astype(np.int64), 'Type of Travel': np.random.choice(['Personal Travel', 'Business travel'], size=ages_loyal.shape[0], replace=True, p=[0.09, 0.91]), 'Class': np.random.choice(['Eco Plus', 'Business', 'Eco'], size=ages_loyal.shape[0], replace=True, p=None)})
df_disloyal = pd.DataFrame({'Gender': np.random.choice(['Male', 'Female'], size=ages_disloyal.shape[0], replace=True, p=None), 'Customer Type': ['disloyal Customer'] * ages_disloyal.shape[0], 'Age': ages_disloyal.astype(np.int64), 'Type of Travel': np.random.choice(['Personal Travel', 'Business travel'], size=ages_disloyal.shape[0], replace=True, p=[0.09, 0.91]), 'Class': np.random.choice(['Eco Plus', 'Business', 'Eco'], size=ages_disloyal.shape[0], replace=True, p=None)})
df = pd.concat([df_loyal, df_disloyal])

model = linear_model.LogisticRegression()
model.fit(pd.get_dummies(df.iloc[:,df.columns != 'Customer Type']),df['Customer Type'])

predicted = model.predict(pd.get_dummies(df.iloc[:,df.columns != 'Customer Type']))

results = pd.DataFrame({'predicted': predicted, 'actual': df['Customer Type']})

pd.crosstab(results['predicted'], results['actual'])

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM