如何从生成器创建的tf.data.Dataset返回具有多个功能的字典？

Question

我有一个示例数据集，如下所示：

feature_1    feature_2    label
4            5            1
4            3            1
4            6            2
...

我为每个功能（feature_1和feature_2）创建了一个tf.feature_column.embedding_column ，所以我必须从train_input_fn返回一个功能字典，其中的键与这些功能具有相同的名称。 我的输入函数如下：

def train_input_fn(features, labels, output_types, output_shapes, batch_size, feature_names):
    """
    Provides the data pipeline for the training process.
    :param features: (numpy.array) A numpy array that holds the training features.
    :param labels: (numpy.array) A numpy array that holds the target variable.
    :param output_types: (tuple(tensorflow.DType)) A tuple containing the data type of each component yielded.
    :param output_shapes: (tuple(tensorflow.TensorShape)) A tuple containing the shape of each component yielded.
    :param batch_size: (int) The size of every batch.
    :return: (dict, int) A dictionary of key -> value for every feature and the target label.
    """
    def gen():
        for f, l in zip(features, labels):
            yield f, l

    ds = tf.data.Dataset.from_generator(gen, output_types, output_shapes)
    # If we do repeat without any argument we actually create and infinite loop.
    # That is preferred, we can now control the iterations via epochs.
    ds = ds.repeat().batch(batch_size)
    feature, label = ds.make_one_shot_iterator().get_next()

    return {'feature': feature}, label

我如何退回类似的内容：

{'feature_1': x_1, 'feature_2': x_2}

Answer 1

这几处更改应该可以做到：

def train_input_fn(features, labels, output_types, output_shapes, batch_size, feature_names):
    """
    Provides the data pipeline for the training process.
    :param features: (numpy.array) A numpy array that holds the training features.
    :param labels: (numpy.array) A numpy array that holds the target variable.
    :param output_types: (tuple(tensorflow.DType)) A tuple containing the data type of each component yielded.
    :param output_shapes: (tuple(tensorflow.TensorShape)) A tuple containing the shape of each component yielded.
    :param batch_size: (int) The size of every batch.
    :return: (dict, int) A dictionary of key -> value for every feature and the target label.
    """
    def gen():
        for f, l in zip(features, labels):
            yield f, l

    ds = tf.data.Dataset.from_generator(gen, output_types, output_shapes)
    # If we do repeat without any argument we actually create and infinite loop.
    # That is preferred, we can now control the iterations via epochs.
    ds = ds.repeat().batch(batch_size)
    feature, label = ds.make_one_shot_iterator().get_next()

    return {'feature_1': feature[:, 0], 'feature_2': feature[:, 1]}, label

如何从生成器创建的tf.data.Dataset返回具有多个功能的字典？

问题描述

1 个解决方案

解决方案1
0 2018-06-26 16:49:55

如何从生成器创建的tf.data.Dataset返回具有多个功能的字典？

问题描述

1 个解决方案

解决方案1 0 2018-06-26 16:49:55

解决方案1
0 2018-06-26 16:49:55