简体   繁体   English

如果我们必须在python中预测客户的下一个支出类别,那是什么样的机器学习问题?

[英]What kind of Machine Learning Problem it is if we have to Predict customer next spend category in python?

I have a data set of shape -> (6210782, 5) . 我有一个形状为-> (6210782, 5)的数据集。

This has 200,000 unique customers and their transactions at different different outlets. 它拥有200,000 unique customers及其在不同商店的交易。 Time Series is little over an year. 时间序列是一年多一点的时间。

df.head()

customer_id TransactionDate TransationTime  Amount  OutletCategory
514         22-04-2015      19:42:18        9445    M16
514         23-04-2015      16:29:28        2000    M23
514         02-05-2015      15:17:55        1398    M16
514         27-06-2015      13:51:29        1995    M7
514         07-08-2015      17:31:30        2000    M23

What Kind of Machine Learning Problem it is and what should be the approach and algorithm used in solving following tasks: 这是什么类型的机器学习问题?解决以下任务时应使用的方法和算法是什么:

1) predict customers Next Transaction category ? 1) predict customers Next Transaction category (I am thinking of this as multinomial classification) (我认为这是多项式分类)

2) predict customers Next Transaction category in next 6 hrs ? 2) predict customers Next Transaction category in next 6 hrs

3) predict customers Next Transaction Amount ? 3) predict customers Next Transaction Amount (Is this an LSTM task) (这是LSTM任务)

4) predict customers Next Transaction Amount in next 6 hrs ? 4) predict customers Next Transaction Amount in next 6 hrs

As we have 200,000 unique customers how should I prepare the data if I have to predict the next transaction amount ? 由于我们有200,000个唯一客户,如果我必须预测下一个交易额,我应该如何准备数据? Should I pivot the customers to columns??? 我应该把客户引导到专栏吗???


Data/ Time Series Exploration that may help visualize the data: 可以帮助可视化数据的数据/时间序列探索:

Below is the Transactions Amount wrt to categories over the time series chart: 以下是时间序列图中各个类别的交易金额:

在此处输入图片说明

For below charts: I have created a small data set of "Datetime" as index and "Amount" column to understand the transnational behavior wrt to time. For below charts:我创建了一个小的数据集“ Datetime”作为索引,并创建了“ Amount”列以了解跨国行为与时间的关系。

Amount Spend to Transaction Dates chart: 支出到交易日期金额图表:

在此处输入图片说明

Amount Spend to Weekly TransactionDates chart: 每周交易日期图表上的支出金额:

在此处输入图片说明

Mean of Amount spend in a day(hourly) 一天的平均支出金额(每小时) 在此处输入图片说明


Expectations: I am new to Data Science and Python so just looking for right steps to proceed with the task (will manage the code myself) 期望:我是Data Science和Python的新手,所以只是寻找正确的步骤来继续执行该任务(将自己管理代码)

There will be never the exactly right answer to this kind of problem. 对于这种问题,永远不会有完全正确的答案。

To your problems: 给您的问题:

Everything related to 6 hours seems like to be a Timeseries problem. 与6小时有关的所有内容似乎都是一个时间序列问题。 The works eg with Arima-Models. 该作品例如有马模型。

3) Is a Regression, you basically have to predict a amount which has a wide range of possibilities. 3)是回归,您基本上必须预测一个具有广泛可能性的金额。 The starting point could be a linear-regression. 起点可以是线性回归。 But there are also other algorithms for that 但是还有其他算法

1) Should be a multiclass problem, for this you could use a decision tree eg 1)应该是一个多类问题,为此,您可以使用决策树,例如

In general: 一般来说:

To give you more ideas: Scikit-Learn https://scikit-learn.org/stable/ can be a good starting point for you. 为您提供更多想法:Scikit-Learn https://scikit-learn.org/stable/可能是您的一个很好的起点。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM