[英]How to use TensorFlow tf.data.Dataset flat_map to produce a derived dataset?
I have a Pandas DataFrame, and I'm loading part of it into a tf.data Dataset: 我有一个Pandas DataFrame,并将一部分加载到tf.data数据集中:
dataset = tf.data.Dataset.from_tensor_slices((
df.StringColumn.values,
df.IntColumn1.values,
df.IntColumn2.values,
))
Now what I would like to do is to use something like flat_map
to produce a derived Dataset that takes the data in each row and produces a bunch of rows in the derived Dataset for each row in the original. 现在我想做的是使用
flat_map
类的flat_map
来生成派生的数据集,该数据集接受每一行中的数据,并为原始行中的每一行在派生数据集中产生一堆行。
But flat_map
seems to just pass me placeholder tensors in the lambda
function. 但是
flat_map
似乎只是在lambda
函数中传递了占位符张量。
I'm using TensorFlow 2.0 alpha 0 if that matters. 如果这很重要,我正在使用TensorFlow 2.0 alpha 0。
Edit: 编辑:
What I'd like is to be able to write something like this: 我想要的是能够写这样的东西:
derived = dataset.flat_map(replicate)
def replicate(s, i1, i2):
return [[0, s, i1, i2],
[0.25, s, i1, i2],
[0.5, s, i1, i2],
[0.75, s, i1, i2]]
... and then have derived
be a Dataset with four columns and four times as many rows as dataset
. ...然后
derived
一个具有四列和四倍于dataset
行的dataset
。
But when I try this, s
isn't a value, it's a string placeholder tensor. 但是,当我尝试这样做时,
s
不是一个值,它是一个字符串占位符张量。
Edit 2: 编辑2:
Okay, what I meant is that the replicate
function needs to know the values of the row it's replicating: 好的,我的意思是
replicate
函数需要知道要复制的行的值:
slice_count = 16
def price(frac, total, size0, price0, size1, price1, size2, price2, size3, price3):
total_per_slice = total / slice_count
start = frac * total_per_slice
finish = start + total_per_slice
price = \
(price0 * (min(finish, size0) - max(start, 0) if 0 < finish and start < size0 else 0)) + \
(price1 * (min(finish, size1) - max(start, size0) if size0 < finish and start < size1 else 0)) + \
(price2 * (min(finish, size2) - max(start, size1) if size1 < finish and start < size2 else 0)) + \
(price3 * (min(finish, size3) - max(start, size2) if size2 < finish and start < size3 else 0))
def replicate(size0, price0, size1, price1, size2, price2, size3, price3):
total = size0 + size1 + size2 + size3
return [[
price(frac, total, size0, price0, size1, price1, size2, price2, size3, price3),
frac / slice_count] for frac in range(slice_count)]
derived = dataset.flat_map(replicate)
It's not sufficient to just be able to pass placeholders along. 仅仅能够传递占位符还不够。 Is this something I can do, or is it doable if I can somehow translate it into TensorFlow's calculation graphs, or is it just not doable the way I'm trying to do it?
这是我可以做的事情吗,还是可以将其转换为TensorFlow的计算图是否可行?或者只是我尝试这样做的方式不可行?
Possibly a long way around but you can also use .concatenate()
with apply()
to achieve a 'flat mapping' 可能还有很长一段路要走,但您也可以将
.concatenate()
与apply()
以实现“平面映射”
something like this: 像这样的东西:
def replicate(ds):
return (ds.map(lambda s,i1,i2: (s, i1, i2, tf.constant(0.0)))
.concatenate(ds.map(lambda s,i1,i2: (s, i1, i2, tf.constant(0.25))))
.concatenate(ds.map(lambda s,i1,i2: (s, i1, i2, tf.constant(0.5))))
.concatenate(ds.map(lambda s,i1,i2: (s, i1, i2, tf.constant(0.75)))))
derived = dataset.apply(replicate)
should give you the output you were expecting 应该给您您期望的输出
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.