简体   繁体   中英

tensor.numpy() not working in tensorflow.data.Dataset. Throws the error: AttributeError: 'Tensor' object has no attribute 'numpy'

I am using tensorflow 2.0.0-beta1 and python 3.7

First consider the following piece of code where tensor.numpy() works correctly:

import tensorflow as tf
import numpy as np

np.save('data.npy',np.ones(1024))

def func(mystr): 
    return np.load(mystr.numpy())

mystring = tf.constant('data.npy')
print(func(mystring))

The above code works correctly and outputs [1. 1. 1. ... 1. 1. 1.] [1. 1. 1. ... 1. 1. 1.] .

Now consider the following code in which tensor.numpy() doesn't work.

import tensorflow as tf
import numpy as np

np.save('data.npy',np.ones(1024))

def func(mystr):
    return np.load(mystr.numpy())

mystring = tf.constant('data.npy')
data = tf.data.Dataset.from_tensor_slices([mystring])
data.map(func,1)

The above code gives the following error AttributeError: 'Tensor' object has no attribute 'numpy'

I am unable to figure out why tensor.numpy() doesn't work in the case of tf.data.Dataset.map()

EDIT

The following paragraph clarifies my purpose:

I have a dataset folder which contains millions of data pair (image,time-series). The entire dataset wont fit into memory, so I am using the tf.data.Dataset.map(func). Inside the func() function I want to load a numpy file which contains the time series as well as load the image. For loading the image there are inbuilt functions in tensorflow like tf.io.read_file and tf.image.decode_jpeg that accept string tensor. But np.load() does not accept string tensor. Thats why I want to convert the string tensor into a standard python string.

From the .map() documentation :

irrespective of the context in which map_func is defined (eager vs. graph), tf.data traces the function and executes it as a graph.

To use Python code inside .map() you have two options:

  1. Rely on AutoGraph to convert Python code into an equivalent graph computation. The downside of this approach is that AutoGraph can convert some but not all Python code.
  2. Use tf.py_function , which allows you to write arbitrary Python code but will generally result in worse performance than 1).

For example:

d = tf.data.Dataset.from_tensor_slices(['hello', 'world'])

#  transform a byte string tensor to a byte numpy string and decode to python str
#  upper case string using a Python function
def upper_case_fn(t):
    return t.numpy().decode('utf-8').upper()

#  use the python code in graph mode
d.map(lambda x: tf.py_function(func=upper_case_fn,
      inp=[x], Tout=tf.string))  # ==> [ "HELLO", "WORLD" ]

I hope this is still useful.

The difference is that the first example is executed eagerly but that tf.data.Dataset are inherently lazily evaluated (with good reason).

A dataset can be used to represent arbitrarily large (and even infinite) datasets so they are only evaluated inside a computation graph to enable data to be passed through in chunks.

This means that eagerly executed methods such as numpy() are not available in a dataset pipeline.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM