简体   繁体   English

将一个 tf.data.Datasets 与另一个的所有其他元素合并

[英]Merging one tf.data.Datasets with every other element of another one

I would like to merge two tf.data.Dataset so that only every other sample of the first one is combined with the other, without any sample lost.我想合并两个tf.data.Dataset以便只有第一个的每个其他样本与另一个样本合并,而不会丢失任何样本。

For example, let's have two lists of numbers:例如,让我们有两个数字列表:

ds1 = tf.data.Dataset.range(10)
ds10 = tf.data.Dataset.range(10, 60, 10)

I want to combine them so that samples from the second are added to the first, but only every other time:我想将它们组合起来,以便将来自第二个的样本添加到第一个,但只能每隔一次:

0, 11, 2, 23, 4, 35, 6, 47, 8, 59

There is a zip method that enables to merge two datasets, but it does so by drawing a sample from each -- not combining samples would mean dropping a sample from ds10 , which is not what I want.有一个zip方法可以合并两个数据集,但它是通过从每个数据集中抽取一个样本来实现的——不合并样本意味着从ds10中删除一个样本,这不是我想要的。

I could continue from there, and zipping ds10 with "dummy" samples that are dropped during the zip with ds1 , but it doesn't look very efficient.我可以从那里继续,并使用在ds1期间丢弃的“虚拟”样本压缩ds10 ,但它看起来效率不高。

Is there an efficient way to do that, without dropping samples (either real or "dummy")?有没有一种有效的方法来做到这一点,而不会丢弃样本(真实的或“虚拟的”)?

Try this:尝试这个:

def combine(pair,to_add):
    combined = [pair[0], pair[1] + to_add]
    return tf.data.Dataset.from_tensor_slices(combined)

ds1 = tf.data.Dataset.range(10)
ds2 = tf.data.Dataset.range(10,60,10)

combined = tf.data.Dataset.zip((ds1.batch(2),ds2)).flat_map(combine)

Explanation:解释:

First, batch ds1.batch(2) .This produces [(0,1), (2,3), ...] .首先,批处理ds1.batch(2) 。这会产生[(0,1), (2,3), ...]
Zip this to the other dataset to get [((0,1),10), ((2,3),20), ...] . Zip 这个到另一个数据集得到[((0,1),10), ((2,3),20), ...]
Undo the batching with flat_map and in the process combine every (a,b) with c in each [((a,b),c), ...] like [(a,b+c), ...] .使用flat_map撤消批处理,并在此过程中将每个[((a,b),c), ...]中的每个(a,b)c结合起来,例如[(a,b+c), ...]
The result is then flattened to remove the braces and you get [0, 11, 2, 23, 4, 35, 6, 47, 8, 59] .然后将结果展平以移除大括号,然后得到[0, 11, 2, 23, 4, 35, 6, 47, 8, 59]
Batching and unbatching like this is a common pattern when dealing with several tf.data.Dataset s.在处理多个tf.data.Dataset时,像这样的批处理和取消批处理是一种常见的模式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从 tf.data.Datasets 构建 tf,estimator.DNNClassifier - Build tf,estimator.DNNClassifier from tf.data.Datasets 如何使用 tf.data.Dataset.map 对两个 tf.data.Datasets 进行元素总和,两者都无限迭代? - How to do the element-wise sum of two tf.data.Datasets, both iterating indefinitely, with tf.data.Dataset.map? 从Tensorflow中的多个tf.data.Datasets中随机抽样 - Randomly sample from multiple tf.data.Datasets in Tensorflow 如何为线性回归和训练模型创建 tf.data.Datasets - How to create a tf.data.Datasets for linear regression and train model 使用tf.data.Datasets冻结Tensorflow图时确定输入节点 - Determining input nodes when freezing Tensorflow graphs using tf.data.Datasets 使用 Pandas 将数据集合并为一列 - Merging the datasets into one single column by using Pandas 将函数应用于一列的每个元素以及另一列的每个元素 - Applying a function to every element of one column with every element of other column 将一个数组的每个元素乘以另一个数组的每个元素 - Multiplying every element of one array by every element of another array 数据合并为一列 - Data is merging into one column 熊猫将一个数据框的每月数据与另一数据框的每日数据合并 - Pandas merging monthly data from one dataframe with daily data in another
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM