具有连续和分类列的 HOWTO tf.estimator

Question

I have a tf.estimator which works for continuous variables and I want to expand it to use categorical variables.我有一个适用于连续变量的 tf.estimator，我想将其扩展为使用分类变量。

Consider a pandas dataframe which looks like this:考虑一个看起来像这样的 pandas dataframe：

label           |  con_col          |  cat_col
(float 0 or 1)  |  (float -1 to 1)  |  (int 0-3)
----------------+-------------------+---------------
0               |   0.123           |  2
0               |   0.456           |  1
1               |  -0.123           |  3
1               |  -0.123           |  3
0               |   0.123           |  2

To build the estimator for just the label and the continuous variable column (con_col) I build the following feature_column variable.为了构建仅用于 label 和连续变量列 (con_col) 的估计器，我构建了以下 feature_column 变量。

feature_cols = [
                   tf.feature_column.numeric_column('con_col')
               ]

Then I pass it to the DNNClassifer like so.然后我像这样将它传递给 DNNClassifer。

tf.estimator.DNNClassifier(feature_columns=feature_cols ...)

Later I build a serving_input_fn().后来我建立了一个serving_input_fn()。 In this function I also specify the columns.在此 function 中，我还指定了列。 This routine is quite small and looks like this:这个例程非常小，看起来像这样：

def serving_input_fn(): 
    feat_placeholders['con_col'] = tf.placeholder(tf.float32, [None])

    return tf.estimator.export.ServingInputReceiver(feat_placeholders.copy(), feat_placeholders)

This works.这行得通。 However, when I try to use the categorical column I have a problem.但是，当我尝试使用分类列时，我遇到了问题。

So using the categorical column, this part seems to work.所以使用分类列，这部分似乎工作。

feature_cols = [
    tf.feature_column.sequence_categorical_column_with_identity('cat_col', num_buckets=4))
               ]
tf.estimator.DNNClassifier(feature_columns=feature_cols ...)

For the serving_input_fn() I get suggestions from the stack trace but both suggestions fail.:对于 serving_input_fn() 我从堆栈跟踪中得到建议，但两个建议都失败了。：

def serving_input_fn(): 
    # try #2
    # this fails
    feat_placeholders['cat_col'] = tf.SequenceCategoricalColumn(categorical_column=tf.IdentityCategoricalColumn(key='cat_col', number_buckets=4,default_value=None))

    # try #1
    # this also fails
    # feat_placeholders['cat_col'] = tf.feature_column.indicator_column(tf.feature_column.sequence_categorical_column_with_identity(column, num_buckets=4))

    # try #0
    # this fails. Its using the same form for the con_col
    # the resulting error gave hints for the above code.
    # Note, i'm using this url as a guide.  My cat_col is
    # is similar to that code samples 'dayofweek' except it
    # is not a string.
    # https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/courses/machine_learning/feateng/taxifare_tft/trainer/model.py
    #feat_placeholders['cat_col'] = tf.placeholder(tf.float32, [None])


    return tf.estimator.export.ServingInputReceiver(feat_placeholders.copy(), feat_placeholders)

This is the error message if try #0 is used.如果使用 try #0，这是错误消息。

ValueError: Items of feature_columns must be a <class 'tensorflow.python.feature_column.feature_column_v2.DenseColumn'>. You can wrap a categorical column with an embedding_column or indicator_column. Given: SequenceCategoricalColumn(categorical_column=IdentityCategoricalColumn(key='cat_col', number_buckets=4, default_value=None))

Lak's answer implementation Lak的答案实现

Using Lak's answer as a guide, this works for both both feature columns.以 Lak 的回答为指导，这对两个特征列都有效。

# This is the list of features we pass as an argument to DNNClassifier
feature_cols = []

# Add the continuous column first
feature_cols.append(tf.feature_column.numeric_column('con_col'))                  

# Add the categorical column which is wrapped?
# This creates new columns from a single column?
category_feature_cols = [tf.feature_column.categorical_column_with_identity('cat_col', num_buckets=4)]
for c in category_feature_cols:
    feat_cols.append(tf.feature_column.indicator_column(c))

# now pass this list to the DNN
tf.estimator.DNNClassifier(feature_columns=feature_cols ...)


def serving_input_fn(): 
    feat_placeholders['con_col'] = tf.placeholder(tf.float32, [None])
    feat_placeholders['cat_col'] = tf.placeholder(tf.int64, [None])

Answer 1

You need to wrap categorical columns before sending to DNN:您需要在发送到 DNN 之前包装分类列：

cat_feature_cols = [ tf.feature_column.sequence_categorical_column_with_identity('cat_col', num_buckets=4)) ]
feature_cols = [tf.feature_column.indicator_column(c) for c in cat_feature_cols]

Use indicator column to one-hot encode, or embedded column to embed.使用指示列进行 one-hot 编码，或使用嵌入列进行嵌入。

具有连续和分类列的 HOWTO tf.estimator

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-11-22 12:03:27

具有连续和分类列的 HOWTO tf.estimator

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-11-22 12:03:27

解决方案1
1 已采纳 2019-11-22 12:03:27