简体   繁体   中英

Categorical and continuous cross feature column in Tensorflow

In using Tensorflow's estimators and feature_column it is possible to cross a categorical column and a bucketed continuous column crossed column but not a categorical and numeric cross. Could it be possible to implement this functionality from https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/feature_column/feature_column.py#L704 ?

It would also be great to see any alternate methods for achieving the same outcome within the Tensforflow graph.

import numpy as np

cont = np.array([1,2,3])
cat = np.array(['cat', 'dog', 'cat'])

cross_function(cat, cont) = np.array([[1,0],[0,2],[3,0]])

To answer my own question here. The steps involved are:

  1. Numerically encoding the categorical feature
    • Within the graph so it's possible within train and serve
  2. One hot encoding the numerical result
  3. Multiplying this with the continuous variable

Code:

import numpy as np
import tensorflow as tf

cont = np.array([1,2,3])
cat = np.array(['cat', 'dog', 'cat'])
categories = np.unique(cat)

def categorical_continuous_interaction(categorical_onehot, continuous):

    cont = tf.expand_dims(continuous, 0)
    return tf.transpose(tf.multiply(tf.transpose(categorical_onehot), cont))

def transformation_function(feature_dictionary, mapping_table):

    continuous_feature = feature_dictionary['cont']

    categorical_feature = mapping_table.lookup(feature_dictionary['cat'])
    onehot = tf.one_hot(categorical_feature, categories.shape[0])
    cross_feature = categorical_continuous_interaction(onehot, continuous_feature)

    return {'feature_name': cross_feature}

def input_function(dataframe, label_key, ...):
    # categorical mapping tables, these must be generated outside of the dataset 
    # transformation function but within the input function
    mapping_table = tf.contrib.lookup.index_table_from_tensor(
        mapping=tf.constant(categories),
        num_oov_buckets=0, 
        default_value=-1
    )

    # Generate the dataset of a dictionary of all of the dataframes columns
    dataset = tf.data.Dataset.from_tensor_slices(dict(dataframe))
    # Convert to a dataset of tuples of dicts with the labels as one tuple
    dataset = dataset.map(lambda x: split_label(x, label_key))
    # Transform the features dict within the dataset
    dataset = dataset.map(lambda features, labels: (transformation_function(
        features, mapping_table=mapping_table), labels))

    ...

    return dataset

def serving_input_fn():
    # categorical mapping tables, these must be generated outside of the dataset 
    # transformation function but within the input function
    mapping_table=tf.contrib.lookup.index_table_from_tensor(
        mapping=tf.constant(categories),
        num_oov_buckets=0, 
        default_value=-1
    )
    numeric_receiver_tensors = {
        name: tf.placeholder(dtype=tf.float32, shape=[1], name=name+"_placeholder")
        for name in numeric_feature_column_names
    }
    categorical_receiver_tensors = {
        name: tf.placeholder(dtype=tf.string, shape=[1], name=name+"_placeholder")
        for name in categorical_feature_column_names
    }
    receiver_tensors = {**numeric_receiver_tensors, **categorical_receiver_tensors}

    features = transformation_function(receiver_tensors, 
        country_mapping_table=country_mapping_table)

    return tf.estimator.export.ServingInputReceiver(features, receiver_tensors)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM