简体   繁体   English

你如何在 TensorFlow 中沿着参差不齐的维度索引一个 RaggedTensor?

[英]How do you index a RaggedTensor along the ragged dimension, in TensorFlow?

I need to get values in a ragged tensor by indexing along the ragged dimension.我需要通过沿参差不齐的维度进行索引来获取参差不齐的张量中的值。 Some indexing works ( [:, :x] , [:, -x:] or [:, x:y] ), but not direct indexing ( [:, x] ):一些索引有效( [:, :x] , [:, -x:][:, x:y] ),但不能直接索引( [:, x] ):

R = tf.RaggedTensor.from_tensor([[1, 2, 3], [4, 5, 6]])
print(R[:, :2]) # RaggedTensor([[1, 2], [4, 5]])
print(R[:, 1:2]) # RaggedTensor([[2], [5]])
print(R[:, 1])  # ValueError: Cannot index into an inner ragged dimension.

The documentation explains why this fails:文档解释了为什么会失败:

RaggedTensors supports multidimensional indexing and slicing, with one restriction: indexing into a ragged dimension is not allowed. RaggedTensors 支持多维索引和切片,但有一个限制:不允许对参差不齐的维度进行索引。 This case is problematic because the indicated value may exist in some rows but not others.这种情况是有问题的,因为指示的值可能存在于某些行中,但不存在于其他行中。 In such cases, it's not obvious whether we should (1) raise an IndexError;在这种情况下,我们是否应该 (1) 引发 IndexError 并不明显; (2) use a default value; (2) 使用默认值; or (3) skip that value and return a tensor with fewer rows than we started with.或 (3) 跳过该值并返回一个比我们开始时行数更少的张量。 Following the guiding principles of Python ("In the face of ambiguity, refuse the temptation to guess" ), we currently disallow this operation.遵循 Python 的指导原则(“面对歧义,拒绝猜测”),我们目前不允许此操作。

This makes sense, but how do I actually implement options 1, 2 and 3?这是有道理的,但我如何实际实施选项 1、2 和 3? Must I convert the ragged array into a Python array of Tensors, and manually iterate over them?我是否必须将参差不齐的数组转换为张量的 Python 数组,然后手动迭代它们? Is there a more efficient solution?有没有更有效的解决方案? One that would work 100% in a TensorFlow graph, without going through the Python interpreter?一个可以在 TensorFlow 图中 100% 工作的,而无需通过 Python 解释器?

If you have a 2D RaggedTensor, then you can get behavior (3) with:如果您有一个 2D RaggedTensor,那么您可以通过以下方式获得行为 (3):

def get_column_slice_v3(rt, column):
  assert column >= 0  # Negative column index not supported
  slice = rt[:, column:column+1]
  return slice.flat_values

And you can get behavior (1) by adding an assertion that rt.nrows() == tf.size(slice.flat_values):您可以通过添加 rt.nrows() == tf.size(slice.flat_values) 的断言来获得行为 (1):

def get_column_slice_v1(rt, column):
  assert column >= 0  # Negative column index not supported
  slice = rt[:, column:column+1]
  with tf.assert_equal(rt.nrows(), tf.size(slice.flat_values):
    return tf.identity(slice.flat_values)

To get behavior (2), I think the easiest way is probably to concatenate a vector of default values and then slice again:为了获得行为(2),我认为最简单的方法可能是连接一个默认值向量,然后再次切片:

def get_colum_slice_v2(rt, column, default=None):
  assert column >= 0  # Negative column index not supported
  slice = rt[:, column:column+1]
  if default is None:
    defaults = tf.zeros([slice.nrows(), 1], slice.dtype)
  ele:
    defaults = tf.fill([slice.nrows(), 1], default)
  slice_plus_default = tf.concat([rt, defaults], axis=1)
  slice2 = slice_plus_defaults[:1]
  return slice2.flat_values

It's possible to extend these to support higher-dimensional ragged tensors, but the logic gets a little more complicated.可以扩展这些以支持更高维的参差不齐的张量,但逻辑会变得更复杂一些。 Also it should be possible to extend these to support negative column indices.此外,应该可以扩展这些以支持负列索引。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM