tensor.numpy() not working in tensorflow.data.Dataset. Throws the error: AttributeError: 'Tensor' object has no attribute 'numpy'

Question

I am using tensorflow 2.0.0-beta1 and python 3.7

First consider the following piece of code where tensor.numpy() works correctly:

import tensorflow as tf
import numpy as np

np.save('data.npy',np.ones(1024))

def func(mystr): 
    return np.load(mystr.numpy())

mystring = tf.constant('data.npy')
print(func(mystring))

The above code works correctly and outputs [1. 1. 1. ... 1. 1. 1.] [1. 1. 1. ... 1. 1. 1.] .

Now consider the following code in which tensor.numpy() doesn't work.

import tensorflow as tf
import numpy as np

np.save('data.npy',np.ones(1024))

def func(mystr):
    return np.load(mystr.numpy())

mystring = tf.constant('data.npy')
data = tf.data.Dataset.from_tensor_slices([mystring])
data.map(func,1)

The above code gives the following error AttributeError: 'Tensor' object has no attribute 'numpy'

I am unable to figure out why tensor.numpy() doesn't work in the case of tf.data.Dataset.map()

EDIT

The following paragraph clarifies my purpose:

I have a dataset folder which contains millions of data pair (image,time-series). The entire dataset wont fit into memory, so I am using the tf.data.Dataset.map(func). Inside the func() function I want to load a numpy file which contains the time series as well as load the image. For loading the image there are inbuilt functions in tensorflow like tf.io.read_file and tf.image.decode_jpeg that accept string tensor. But np.load() does not accept string tensor. Thats why I want to convert the string tensor into a standard python string.

Answer 1

From the .map() documentation :

irrespective of the context in which map_func is defined (eager vs. graph), tf.data traces the function and executes it as a graph.

To use Python code inside .map() you have two options:

Rely on AutoGraph to convert Python code into an equivalent graph computation. The downside of this approach is that AutoGraph can convert some but not all Python code.
Use tf.py_function , which allows you to write arbitrary Python code but will generally result in worse performance than 1).

For example:

d = tf.data.Dataset.from_tensor_slices(['hello', 'world'])

#  transform a byte string tensor to a byte numpy string and decode to python str
#  upper case string using a Python function
def upper_case_fn(t):
    return t.numpy().decode('utf-8').upper()

#  use the python code in graph mode
d.map(lambda x: tf.py_function(func=upper_case_fn,
      inp=[x], Tout=tf.string))  # ==> [ "HELLO", "WORLD" ]

I hope this is still useful.

Answer 2

The difference is that the first example is executed eagerly but that tf.data.Dataset are inherently lazily evaluated (with good reason).

A dataset can be used to represent arbitrarily large (and even infinite) datasets so they are only evaluated inside a computation graph to enable data to be passed through in chunks.

This means that eagerly executed methods such as numpy() are not available in a dataset pipeline.

tensor.numpy() not working in tensorflow.data.Dataset. Throws the error: AttributeError: 'Tensor' object has no attribute 'numpy'

Question

2 answers

solution1
9 2019-10-04 11:32:35

solution2
4 2019-06-19 12:23:07

tensor.numpy() not working in tensorflow.data.Dataset. Throws the error: AttributeError: 'Tensor' object has no attribute 'numpy'

Question

2 answers

solution1 9 2019-10-04 11:32:35

solution2 4 2019-06-19 12:23:07

solution1
9 2019-10-04 11:32:35

solution2
4 2019-06-19 12:23:07