简体   繁体   中英

How to use TensorFlow tf.data.Dataset flat_map to produce a derived dataset?

I have a Pandas DataFrame, and I'm loading part of it into a tf.data Dataset:

dataset = tf.data.Dataset.from_tensor_slices((
    df.StringColumn.values,
    df.IntColumn1.values,
    df.IntColumn2.values,
))

Now what I would like to do is to use something like flat_map to produce a derived Dataset that takes the data in each row and produces a bunch of rows in the derived Dataset for each row in the original.

But flat_map seems to just pass me placeholder tensors in the lambda function.

I'm using TensorFlow 2.0 alpha 0 if that matters.

Edit:

What I'd like is to be able to write something like this:

derived = dataset.flat_map(replicate)

def replicate(s, i1, i2):
    return [[0, s, i1, i2],
        [0.25, s, i1, i2],
        [0.5, s, i1, i2],
        [0.75, s, i1, i2]]

... and then have derived be a Dataset with four columns and four times as many rows as dataset .

But when I try this, s isn't a value, it's a string placeholder tensor.

Edit 2:

Okay, what I meant is that the replicate function needs to know the values of the row it's replicating:

slice_count = 16

def price(frac, total, size0, price0, size1, price1, size2, price2, size3, price3):
    total_per_slice = total / slice_count
    start = frac * total_per_slice
    finish = start + total_per_slice
    price = \
        (price0 * (min(finish, size0) - max(start, 0) if 0 < finish and start < size0 else 0)) + \
        (price1 * (min(finish, size1) - max(start, size0) if size0 < finish and start < size1 else 0)) + \
        (price2 * (min(finish, size2) - max(start, size1) if size1 < finish and start < size2 else 0)) + \
        (price3 * (min(finish, size3) - max(start, size2) if size2 < finish and start < size3 else 0))

def replicate(size0, price0, size1, price1, size2, price2, size3, price3):
    total = size0 + size1 + size2 + size3
    return [[
        price(frac, total, size0, price0, size1, price1, size2, price2, size3, price3),
        frac / slice_count] for frac in range(slice_count)]

derived = dataset.flat_map(replicate)

It's not sufficient to just be able to pass placeholders along. Is this something I can do, or is it doable if I can somehow translate it into TensorFlow's calculation graphs, or is it just not doable the way I'm trying to do it?

Possibly a long way around but you can also use .concatenate() with apply() to achieve a 'flat mapping'

something like this:

def replicate(ds):
  return (ds.map(lambda s,i1,i2: (s, i1, i2, tf.constant(0.0)))
          .concatenate(ds.map(lambda s,i1,i2: (s, i1, i2, tf.constant(0.25))))
          .concatenate(ds.map(lambda s,i1,i2: (s, i1, i2, tf.constant(0.5))))
          .concatenate(ds.map(lambda s,i1,i2: (s, i1, i2, tf.constant(0.75)))))

derived = dataset.apply(replicate)

should give you the output you were expecting

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM