简体   繁体   English

如何将时间序列趋势转换为可测量的预测变量

[英]how to transform Time-series trend into a measurable predictor variable

I have a time series data which explains the number of frauds in the transaction over 1 year timeline along with the target variable of fraud or not. 我有一个时间序列数据,该数据说明了超过1年时间轴的交易中欺诈的数量以及欺诈的目标变量与否。

X- axis is time-line and Y- axis is number of frauds detected. X轴是时间线,Y轴是检测到的欺诈数量。

Do we have any ML model/statistical technique that tries to identify the trend in these frauds and convert into a measurable predictor variable with value like 0 to 1, where values close to 1 are more prone to fraud and vic., 我们是否有任何ML模型/统计技术试图识别这些欺诈行为的趋势,并将其转换为可测量的预测变量,其值的范围为0到1,其中接近1的值更容易出现欺诈和vic。

The trends in the frauds over an year is non-linear, so if there is any mathematical transformation i can apply on the time-series so that it can provide me a measurable feature? 一年中欺诈的趋势是非线性的,因此,如果有任何数学变换,我可以对时间序列进行应用,以便为我提供一个可测量的功能?

Any suggestions are much appreciated? 有什么建议值得赞赏吗?

I thought of using normal slope techniques where negative slope wrt time-line are less fraud and positive slope for more fraud. 我想到了使用正常斜率技术,其中负斜率wrt时间线较少欺诈,而正斜率则更多欺诈。 It only captures linear trend, but need to capture non-linear trend. 它仅捕获线性趋势,但需要捕获非线性趋势。

Edit:: 编辑::

I forgot one important point. 我忘记了一个重点。 I will give one scenario to explain this point better. 我将给出一个方案来更好地解释这一点。

For Financial banks, let's say I have 1000 banks and each bank has 12 months time period of how many frauds detected per month and corresponding target variable whether that bank has high chances of fraud or not. 对于金融银行,假设我有1000家银行,每个银行都有12个月的时间段,每个月检测出多少个欺诈行为,以及相应的目标变量,无论该银行是否有很高的欺诈机会。

Now, when I encounter a new bank with corresponding frauds in 12 months, what are the ways to find whether that bank is fraud or not using the 1000 banks fraud pattern? 现在,当我在12个月内遇到一家具有相应欺诈行为的新银行时,有什么方法可以使用1000家银行欺诈模式来查找该银行是否为欺诈行为?

Can we use any time-series approach? 我们可以使用任何时间序列方法吗? I assume, if it is for single bank, time-series handles it as I have multiple banks, I guess using non-linear regression techniques, assuming each month as one feature, training a model might help? 我假设,如果是针对单个银行,则时间序列会在我拥有多个银行的情况下进行处理,我想使用非线性回归技术,假设每个月都是一个功能,训练模型可能会有所帮助? As I can get a polynomial equation which I can use to predict the target? 当我可以得到一个可以用来预测目标的多项式方程时?

Please share your thoughts as well 请也分享您的想法

I'm going to assume your data includes risk variables (Customer data, loan data, etc.). 我假设您的数据包括风险变量(客户数据,贷款数据等)。 I have used linear models, logistical models and conditional interference trees for this. 为此,我使用了线性模型,后勤模型和条件干扰树。 The following is a vary high level view. 以下是各种高级视图。 You really need to understand underlying methods to get a good, functional model. 您确实需要了解基础方法才能获得良好的功能模型。 I recommend the use of dummy variables with these...binary are best for easy interpretation. 我建议将伪变量与这些变量一起使用...二进制最适合于易于解释。

A linear model or logistical model will result in an equation you can use to measure risk of each record (loan). 线性模型或逻辑模型将产生一个方程式,您可以使用该方程式来衡量每条记录(贷款)的风险。 This method requires kicking out outliers, Cook's distance, etc. 此方法需要排除异常值,库克距离等。

Linear: 线性:

step(lm(default flag ~ Variable 1 + Variable 2 + ...))

Logistical: 后勤:

glm(default flag ~ Variable 1 + Variable 2 + ...)

The other is conditional interference trees. 另一个是条件干扰树。 I would use partykit() package with the ctree() function. 我将使用带有ctree()函数的partykit()包。 This will bucket defaults based on statistical significance of variables within the bucket. 这将根据存储桶中变量的统计显着性存储存储桶默认值。

plot(ctree(default flag ~ Variable 1 + Variable 2, data= "Your data", 
controls = "looks this up for your model")

Also, if you are worried about Time to default as well, then look into a survival analysis. 另外,如果您也担心默认时间,请查看生存分析。

I have had success with all three. 我在这三个方面都取得了成功。 If all you have is a time period and default total, you can't really do much with that since you wouldn't have the underlying variables. 如果您所拥有的只是一个时间段和默认的总计,那么您将无法做太多事情,因为您将没有基础变量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM