[英]Merging one tf.data.Datasets with every other element of another one
I would like to merge two tf.data.Dataset
so that only every other sample of the first one is combined with the other, without any sample lost.我想合并两个
tf.data.Dataset
以便只有第一个的每个其他样本与另一个样本合并,而不会丢失任何样本。
For example, let's have two lists of numbers:例如,让我们有两个数字列表:
ds1 = tf.data.Dataset.range(10)
ds10 = tf.data.Dataset.range(10, 60, 10)
I want to combine them so that samples from the second are added to the first, but only every other time:我想将它们组合起来,以便将来自第二个的样本添加到第一个,但只能每隔一次:
0, 11, 2, 23, 4, 35, 6, 47, 8, 59
There is a zip
method that enables to merge two datasets, but it does so by drawing a sample from each -- not combining samples would mean dropping a sample from ds10
, which is not what I want.有一个
zip
方法可以合并两个数据集,但它是通过从每个数据集中抽取一个样本来实现的——不合并样本意味着从ds10
中删除一个样本,这不是我想要的。
I could continue from there, and zipping ds10
with "dummy" samples that are dropped during the zip with ds1
, but it doesn't look very efficient.我可以从那里继续,并使用在
ds1
期间丢弃的“虚拟”样本压缩ds10
,但它看起来效率不高。
Is there an efficient way to do that, without dropping samples (either real or "dummy")?有没有一种有效的方法来做到这一点,而不会丢弃样本(真实的或“虚拟的”)?
Try this:尝试这个:
def combine(pair,to_add):
combined = [pair[0], pair[1] + to_add]
return tf.data.Dataset.from_tensor_slices(combined)
ds1 = tf.data.Dataset.range(10)
ds2 = tf.data.Dataset.range(10,60,10)
combined = tf.data.Dataset.zip((ds1.batch(2),ds2)).flat_map(combine)
Explanation:解释:
First, batch ds1.batch(2)
.This produces [(0,1), (2,3), ...]
.首先,批处理
ds1.batch(2)
。这会产生[(0,1), (2,3), ...]
。
Zip this to the other dataset to get [((0,1),10), ((2,3),20), ...]
. Zip 这个到另一个数据集得到
[((0,1),10), ((2,3),20), ...]
。
Undo the batching with flat_map
and in the process combine every (a,b)
with c
in each [((a,b),c), ...]
like [(a,b+c), ...]
.使用
flat_map
撤消批处理,并在此过程中将每个[((a,b),c), ...]
中的每个(a,b)
与c
结合起来,例如[(a,b+c), ...]
。
The result is then flattened to remove the braces and you get [0, 11, 2, 23, 4, 35, 6, 47, 8, 59]
.然后将结果展平以移除大括号,然后得到
[0, 11, 2, 23, 4, 35, 6, 47, 8, 59]
。
Batching and unbatching like this is a common pattern when dealing with several tf.data.Dataset
s.在处理多个
tf.data.Dataset
时,像这样的批处理和取消批处理是一种常见的模式。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.