简体繁体 English

TensorFlow中的特征选择

[英]Feature Selection in TensorFlow

原文 2016-10-19 10:53:30 5 1 python/ tensorflow

In the TensorFlow documention, it is mentioned that "Through dense embeddings, deep models can generalize better and make predictions on feature pairs that were previously unseen in the training data." 在TensorFlow文档中，提到“通过密集的嵌入，深度模型可以更好地进行泛化，并对训练数据中以前看不到的特征对进行预测。”

How can we use the dense embeddings in the code and get the new features that TensorFlow makes, which is using generalization and memorization? 我们如何使用代码中的密集嵌入并获得TensorFlow的新功能，即使用归纳和记忆？

Or another way of putting it, how to use TensorFlow as a feature selection algorithm? 换一种说法，如何将TensorFlow用作特征选择算法？

Source 资源

1 个解决方案

Tensor flow does both feature selection and feature transformation to select the best model when optimized to generalize against validation set using suitable regularization. Tensor流会同时进行特征选择和特征转换，以选择最佳模型，以进行优化以使用适当的正则化针对验证集进行泛化。 However, the selected input features are hard to discover in dense&deep models. 但是，在密集和深度模型中很难发现所选的输入功能。 The wide models (1 layer) give a sense of which features are most relevant for the problem. 较宽的模型（1层）可让您了解与问题最相关的功能。 These act as logistic regression layer with weights of the edges representing relative importance of features. 这些充当逻辑回归层，边缘权重代表要素的相对重要性。 Abalation is another way to do feature selection by training NN without one evaluation feature and noting the drop in metrics. Abalation是另一种通过选择无一个评估特征的NN并注意度量下降来进行特征选择的方法。

Dense embeddings can be viewed as complex transformations for input feature fit to the training examples. 可以将密集嵌入视为适合训练示例的输入特征的复杂转换。 They do not generalize well for unseen data in your test set. 对于您的测试集中看不见的数据，它们不能很好地概括。 They are obtained by defining tensors as : tf.contrib.layers.embedding_column 通过将张量定义为tf.contrib.layers.embedding_column获得它们