如何使用TensorFlow tf.data.Dataset flat_map生成派生数据集？

Question

我有一个Pandas DataFrame，并将一部分加载到tf.data数据集中：

dataset = tf.data.Dataset.from_tensor_slices((
    df.StringColumn.values,
    df.IntColumn1.values,
    df.IntColumn2.values,
))

现在我想做的是使用flat_map类的flat_map来生成派生的数据集，该数据集接受每一行中的数据，并为原始行中的每一行在派生数据集中产生一堆行。

但是flat_map似乎只是在lambda函数中传递了占位符张量。

如果这很重要，我正在使用TensorFlow 2.0 alpha 0。

编辑：

我想要的是能够写这样的东西：

derived = dataset.flat_map(replicate)

def replicate(s, i1, i2):
    return [[0, s, i1, i2],
        [0.25, s, i1, i2],
        [0.5, s, i1, i2],
        [0.75, s, i1, i2]]

...然后derived一个具有四列和四倍于dataset行的dataset 。

但是，当我尝试这样做时， s不是一个值，它是一个字符串占位符张量。

编辑2：

好的，我的意思是replicate函数需要知道要复制的行的值：

slice_count = 16

def price(frac, total, size0, price0, size1, price1, size2, price2, size3, price3):
    total_per_slice = total / slice_count
    start = frac * total_per_slice
    finish = start + total_per_slice
    price = \
        (price0 * (min(finish, size0) - max(start, 0) if 0 < finish and start < size0 else 0)) + \
        (price1 * (min(finish, size1) - max(start, size0) if size0 < finish and start < size1 else 0)) + \
        (price2 * (min(finish, size2) - max(start, size1) if size1 < finish and start < size2 else 0)) + \
        (price3 * (min(finish, size3) - max(start, size2) if size2 < finish and start < size3 else 0))

def replicate(size0, price0, size1, price1, size2, price2, size3, price3):
    total = size0 + size1 + size2 + size3
    return [[
        price(frac, total, size0, price0, size1, price1, size2, price2, size3, price3),
        frac / slice_count] for frac in range(slice_count)]

derived = dataset.flat_map(replicate)

仅仅能够传递占位符还不够。 这是我可以做的事情吗，还是可以将其转换为TensorFlow的计算图是否可行？或者只是我尝试这样做的方式不可行？

Answer 1

可能还有很长一段路要走，但您也可以将.concatenate()与apply()以实现“平面映射”

像这样的东西：

def replicate(ds):
  return (ds.map(lambda s,i1,i2: (s, i1, i2, tf.constant(0.0)))
          .concatenate(ds.map(lambda s,i1,i2: (s, i1, i2, tf.constant(0.25))))
          .concatenate(ds.map(lambda s,i1,i2: (s, i1, i2, tf.constant(0.5))))
          .concatenate(ds.map(lambda s,i1,i2: (s, i1, i2, tf.constant(0.75)))))

derived = dataset.apply(replicate)

应该给您您期望的输出

如何使用TensorFlow tf.data.Dataset flat_map生成派生数据集？

问题描述

1 个解决方案

解决方案1
0 2019-05-23 06:58:58

如何使用TensorFlow tf.data.Dataset flat_map生成派生数据集？

问题描述

1 个解决方案

解决方案1 0 2019-05-23 06:58:58

解决方案1
0 2019-05-23 06:58:58