简体   繁体   English

对特征工程的建议

[英]Suggestions for feature engineering

I am having a problem during feature engineering.我在特征工程过程中遇到了问题。 Looking for some suggestions.寻找一些建议。 Problem statement: I have usage data of multiple customers for 3 days.问题陈述:我有多个客户 3 天的使用数据。 Some have just 1 day usage some 2 and some 3. Data is related to number of emails sent / contacts added on each day etc.有些只使用 1 天,有些使用 2 天,有些使用 3 天。数据与每天发送的电子邮件数量/添加的联系人等有关。

I am converting this time series data to column-wise ie., number of emails sent by a customer on day1 as one feature, number of emails sent by a customer on day2 as one feature and so on.我正在将此时间序列数据转换为列方式,即客户在第 1 天发送的电子邮件数量作为一项功能,客户在第 2 天发送的电子邮件数量作为一项功能等等。 But problem is that, the usage can be of either increasing order or decreasing order for different customers.但问题是,对于不同的客户,使用可以是递增的,也可以是递减的。

ie., example 1: customer 'A' --> 'number of emails sent on 1st .即,示例 1:客户 'A' --> '1st 发送的电子邮件数量。 day' = 100 .天' = 100 。 ' number of emails sent on 2nd day'=0 '第 2 天发送的电子邮件数量'=0

example 2: customer 'B' --> 'number of emails sent on 1st .示例 2:客户 'B' --> '1st 发送的电子邮件数量。 day' = 0 .天' = 0 。 ' number of emails sent on 2nd day'=100 '第 2 天发送的电子邮件数量'=100

example 3: customer 'C' --> 'number of emails sent on 1st .示例 3:客户 'C' --> '1st 发送的电子邮件数量。 day' = 0 .天' = 0 。 ' number of emails sent on 2nd day'=0 '第 2 天发送的电子邮件数量'=0

example 4: customer 'D' --> 'number of emails sent on 1st .示例 4:客户 'D' --> '1st 发送的电子邮件数量。 day' = 100 .天' = 100 。 ' number of emails sent on 2nd day'=100 '第 2 天发送的电子邮件数量'=100

In the first two cases => My new feature will have "-100" and "100" as values.在前两种情况下 => 我的新功能将使用“-100”和“100”作为值。 Which I guess is good for differentiating.我想这有利于区分。 But the problem arises for 3rd and 4th columns when the new feature value will be "0" in both scenarios Can anyone suggest a way to handle this但是当新特征值在两种情况下都为“0”时,第 3 列和第 4 列会出现问题任何人都可以建议一种方法来处理这个问题

You can extract the following features:您可以提取以下特征:

  1. Simple Moving Averages for day 2 and day 3 respectively.分别为第 2 天和第 3 天的简单移动平均线 This means you now have two extra columns.这意味着您现在有两个额外的列。

  2. Percentage Change from previous day与前一天相比的百分比变化

  3. Percentage Change from day 1 to 3从第 1 天到第 3 天的百分比变化

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM