简体   繁体   English

H20 Autoencoder Anomaly只接受数字预测器

[英]H20 Autoencoder Anomaly only accepting numerical predictors

I am using h2o autoencoder anomaly for finding outlier data in my model but issue is autoencoder only accepts numerical predictors. 我正在使用h2o自动编码器异常来查找模型中的异常数据,但问题是自动编码器仅接受数字预测变量。 My requirement is i have find outlier's based on CardNumber or merchant number. 我的要求是我找到了基于CardNumber或商号的异常值。 and Cardnumber is 12 digit(342178901244) and unique mostly So its nominal data and we can not do hot encoding as well as it will create many new fields as many as unique card no. 和Cardnumber是12位数(342178901244)并且大部分是独一无二的所以它的名义数据和我们不能做热编码以及它会创建许多新的字段,就像唯一的卡号一样多。 So please suggest any way we can include categorical data as well and still we can run autoencoder 所以请建议我们可以包含分类数据的任何方式,我们仍然可以运行autoencoder

model=H2OAutoEncoderEstimator(activation="Tanh",
                              hidden=[70],
                              ignore_const_cols=False,
                              epochs=40)

model.train(x=predictors,training_frame=train.hex)

#Get anomalous values
test_rec_error=model.anomaly(test.hex,per_feature=True)
train_rec_error=model.anomaly(train.hex,per_feature=True)
recon_error_df['outlier'] = np.where(recon_error_df['Reconstruction.MSE'] > top_whisker, 'outlier', 'no_outlier')

You can't put an almost-unique categorical feature in a predictor (autoencoder or anything else) and expect it to work. 您不能在预测变量(自动编码器或其他任何事物)中放置几乎唯一的分类功能,而不能期望它起作用。

Instead you need to extract meaningful features from it, which depend on the problem you want to solve. 相反,您需要从中提取有意义的功能,具体取决于您要解决的问题。 For example if it is a credit card number you could add a feature encoding the card circuit (VISA, Mastercard, American Express, ...). 例如,如果它是信用卡号,则可以添加对卡电路进行编码的功能(VISA,万事达卡,美国运通卡,...)。
The limit is only your imagination and knowledge of the domain. 限制只是你的想象力和领域的知识。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM