[英]What kind of Machine Learning Problem it is if we have to Predict customer next spend category in python?
I have a data set of shape -> (6210782, 5)
. 我有一个形状为->
(6210782, 5)
的数据集。
This has 200,000 unique customers
and their transactions at different different outlets. 它拥有
200,000 unique customers
及其在不同商店的交易。 Time Series is little over an year. 时间序列是一年多一点的时间。
df.head()
customer_id TransactionDate TransationTime Amount OutletCategory
514 22-04-2015 19:42:18 9445 M16
514 23-04-2015 16:29:28 2000 M23
514 02-05-2015 15:17:55 1398 M16
514 27-06-2015 13:51:29 1995 M7
514 07-08-2015 17:31:30 2000 M23
What Kind of Machine Learning Problem it is and what should be the approach and algorithm used in solving following tasks: 这是什么类型的机器学习问题?解决以下任务时应使用的方法和算法是什么:
1) predict customers Next Transaction category
? 1)
predict customers Next Transaction category
? (I am thinking of this as multinomial classification) (我认为这是多项式分类)
2) predict customers Next Transaction category in next 6 hrs
? 2)
predict customers Next Transaction category in next 6 hrs
?
3) predict customers Next Transaction Amount
? 3)
predict customers Next Transaction Amount
? (Is this an LSTM task) (这是LSTM任务)
4) predict customers Next Transaction Amount in next 6 hrs
? 4)
predict customers Next Transaction Amount in next 6 hrs
?
As we have 200,000 unique customers how should I prepare the data if I have to predict the next transaction amount ? 由于我们有200,000个唯一客户,如果我必须预测下一个交易额,我应该如何准备数据? Should I pivot the customers to columns???
我应该把客户引导到专栏吗???
Data/ Time Series Exploration that may help visualize the data: 可以帮助可视化数据的数据/时间序列探索:
Below is the Transactions Amount wrt to categories over the time series chart: 以下是时间序列图中各个类别的交易金额:
For below charts:
I have created a small data set of "Datetime" as index and "Amount" column to understand the transnational behavior wrt to time. For below charts:
我创建了一个小的数据集“ Datetime”作为索引,并创建了“ Amount”列以了解跨国行为与时间的关系。
Amount Spend to Transaction Dates chart: 支出到交易日期金额图表:
Amount Spend to Weekly TransactionDates chart: 每周交易日期图表上的支出金额:
Mean of Amount spend in a day(hourly) 一天的平均支出金额(每小时)
Expectations: I am new to Data Science and Python so just looking for right steps to proceed with the task (will manage the code myself) 期望:我是Data Science和Python的新手,所以只是寻找正确的步骤来继续执行该任务(将自己管理代码)
There will be never the exactly right answer to this kind of problem. 对于这种问题,永远不会有完全正确的答案。
To your problems: 给您的问题:
Everything related to 6 hours seems like to be a Timeseries problem. 与6小时有关的所有内容似乎都是一个时间序列问题。 The works eg with Arima-Models.
该作品例如有马模型。
3) Is a Regression, you basically have to predict a amount which has a wide range of possibilities. 3)是回归,您基本上必须预测一个具有广泛可能性的金额。 The starting point could be a linear-regression.
起点可以是线性回归。 But there are also other algorithms for that
但是还有其他算法
1) Should be a multiclass problem, for this you could use a decision tree eg 1)应该是一个多类问题,为此,您可以使用决策树,例如
In general: 一般来说:
To give you more ideas: Scikit-Learn https://scikit-learn.org/stable/ can be a good starting point for you. 为您提供更多想法:Scikit-Learn https://scikit-learn.org/stable/可能是您的一个很好的起点。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.