简体   繁体   中英

What is tensorflow.python.data.ops.dataset_ops._OptionsDataset?

I am using the Transformer code from tensorflow - https://www.tensorflow.org/beta/tutorials/text/transformer

In this code, the dataset used is loaded like this -

examples, metadata = tfds.load('ted_hrlr_translate/pt_to_en', with_info=True,
                               as_supervised=True)
train_examples, val_examples = examples['train'], examples['validation']

When I check the type of train_examples using :

type(train_examples)

I get the following as output -

tensorflow.python.data.ops.dataset_ops._OptionsDataset

Now I just wanted to change some entries of the dataset that is the sentences, but I am not able to as I don't understand the type.

I am able to iterate over it using :

for data in train_examples:
    print(data,type(data))

And type of data is -

<class 'tuple'>

Finally what I want is to replace some of these tuples with my own data. Can someone tell me how to do this or give me some details about this type tensorflow.python.data.ops.dataset_ops._OptionsDataset .

tensorflow.python.data.ops.dataset_ops._OptionsDataset is just another class extending the base class tf.compat.v2.data.Dataset (DatasetV2) which holds tf.data.Options along with the original tf.compat.v2.data.Dataset dataset (The Portuguese-English tuples in your case).

( tf.data.Options operates when you are using streaming functions over your dataset tf.data.Dataset.map or tf.data.Dataset.interleave )

How to view the individual elements?

I'm sure there are many ways, but one straight way would be to use the iterator in the base class:

Since examples['train'] is a type of _OptionsDataset here is iterating by calling a method from tf.compat.v2.data.Dataset

iterator = examples['train'].__iter__()
next_element = iterator.get_next()
pt = next_element[0]
en = next_element[1]
print(pt.numpy())
print(en.numpy())

Here is the output:

b'o problema \xc3\xa9 que nunca vivi l\xc3\xa1 um \xc3\xbanico dia .'
b"except , i 've never lived one day of my life there ."

Substituting with your own data:

Since you've not mentioned what you want to substitute the original dataset with, I'll assume you have a CSV/TSV file of your own specific translations. Then it should be useful to create a separate tf.compat.v2.data.Dataset object itself by calling the CSV API to read your CSV file into a dataset:

tf.data.experimental.make_csv_dataset

https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/r2/tutorials/load_data/csv.ipynb

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM