简体   繁体   English

要使用测试和训练数据进行分组预测,请按多列分组

[英]To make group by predictions using test & train data, group by multiple columns

Am very new to machine learning, and alogirithms.我对机器学习和算法非常陌生。 Am learning the various ml concepts, so please excuse for my ignorance.我正在学习各种机器学习概念,所以请原谅我的无知。

I am working on a project, wherein I need to make a prediction for sales rep calls to be made for the future quarter, based on the rep_calls in the last historic data.我正在做一个项目,其中我需要根据最后一个历史数据中的 rep_calls 来预测下一个季度的销售代表电话。 Am providing herewith a sample dataframe for your reference and to provide suggestions please.特此提供样品 dataframe 供您参考并提供建议。

The rep_calls prediction for QTR4, should be based on the rep calls for the CUSTOMER_NUMBER & PRODUCT_ID, that is available for the last 3 quarter. QTR4 的 rep_calls 预测应基于过去 3 个季度可用的 CUSTOMER_NUMBER 和 PRODUCT_ID 的 rep_calls。

df = pd.DataFrame({"CUSTOMER_NUMBER": ["CUST1", "CUST1", "CUST1", "CUST1", "CUST1", "CUST1", "CUST1", "CUST1", "CUST1", "CUST2", "CUST2", "CUST2", "CUST2", "CUST2", "CUST2", "CUST2", "CUST3", "CUST3", "CUST3", "CUST4", "CUST4", "CUST4"],
"PRODUCT": ["PRODUCT1", "PRODUCT2", "PRODUCT3", "PRODUCT1", "PRODUCT2", "PRODUCT3", "PRODUCT1", "PRODUCT2", "PRODUCT3", "PRODUCT1", "PRODUCT2", "PRODUCT3", "PRODUCT1", "PRODUCT2", "PRODUCT3", "PRODUCT3", "PRODUCT3", "PRODUCT3", "PRODUCT3", "PRODUCT1", "PRODUCT1", "PRODUCT2"],
"REP_VISITS": ["3", "3", "3", "3", "3", "3", "4", "4", "4", "3", "2", "2", "4", "6", "8", "5", "3", "1", "3", "2", "0", "3"],
"QTR": ["QTR1", "QTR1", "QTR1", "QTR2", "QTR2", "QTR2", "QTR3", "QTR3", "QTR3", "QTR1", "QTR1", "QTR1", "QTR2", "QTR2", "QTR2", "QTR3", "QTR1", "QTR2", "QTR3", "QTR1", "QTR2", "QTR3"],
"START_DATE": ["2020-01-01", "2020-01-01", "2020-01-01", "2020-04-01", "2020-04-01", "2020-04-01", "2020-07-01", "2020-07-01", "2020-07-01", "2020-01-01", "2020-01-01", "2020-01-01", "2020-04-01",  "2020-04-01", "2020-04-01","2020-07-01", "2020-01-01", "2020-04-01", "2020-07-01", "2020-01-01", "2020-04-01", "2020-07-01"],
"END_DATE": ["2020-03-31", "2020-03-31", "2020-03-31", "2020-06-30", "2020-06-30", "2020-06-30", "2020-09-30", "2020-09-30", "2020-09-30", "2020-03-31", "2020-03-31", "2020-03-31", "2020-06-30", "2020-06-30", "2020-06-30", "2020-09-30", "2020-03-31", "2020-06-30", "2020-09-30", "2020-03-31", "2020-06-30", "2020-09-30"]})

The dataframe looks as below: dataframe 如下所示:

在此处输入图像描述

I need to find out the predicted rep_calls for QTR4.我需要找出 QTR4 的预测 rep_calls。

CUST1|PRODUCT1||QTR4|
CUST1|PRODUCT2||QTR4|
CUST1|PRODUCT3||QTR4|
CUST2|PRODUCT1||QTR4|
CUST2|PRODUCT2||QTR4|
CUST2|PRODUCT3||QTR4|
CUST3|PRODUCT3||QTR4|
CUST4|PRODUCT1||QTR4|
CUST4|PRODUCT2||QTR4|

Please guide me how i can create training dataset for customers/products, with appopriate predictions, so i can use the test_data for predictions/valuations.请指导我如何使用适当的预测为客户/产品创建训练数据集,以便我可以使用 test_data 进行预测/评估。

I think you could try using customer no.我认为您可以尝试使用客户编号。 and product id as features and train a simple classifier using Logistic Regression or decision trees.和产品 id 作为特征,并使用逻辑回归或决策树训练一个简单的分类器。 You could try using 1-hot encoding for different customer numbers and product ID's.您可以尝试对不同的客户编号和产品 ID 使用 1-hot 编码。 If you are trying this approach, REP_visits could be the labels and the features could be cust1, cust2,cust3, product1, product2, etc. scikitlearn has implementations of these algorithms, which are easy to use.如果您正在尝试这种方法,REP_visits 可能是标签,特征可能是 cust1、cust2、cust3、product1、product2 等。scikitlearn 具有这些算法的实现,易于使用。 Hope this helps:希望这可以帮助:

from sklearn.tree import DecisionTreeClassifier 
unique_cust_nos = df['CUSTOMER_NUMBER'].unique()
unique_products = df['PRODUCT'].unique()
features = []
for item in unique_cust_nos:
    features.append(item)
for item in unique_products:
    features.append(item)
for idx, item in df.iterrows():
#     make a dataframe(all_features_df) so that ['CUST1', 'CUST2', 'CUST3', 'CUST4', 'PRODUCT1', 'PRODUCT2', 'PRODUCT3'] are feature columns and rep_visits is the label
X = all_features_df[feature_cols] # Features
y = all_features_df[label] # Target variable
# Create Decision Tree classifer object
clf = DecisionTreeClassifier()
# Train Decision Tree Classifer
clf = clf.fit(X,y)
#Predict the response for test dataset
y_pred = clf.predict(X_test)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM