简体   繁体   中英

How do I edit a Tensorflow dataset in a Pandas DataFrame?

I am trying to build a transformer model for abstractive text summarization task. My dataset is the CNN DM and I am trying to put the features on pandas DataFrame.

My code:

pip install tensorflow_datasets
import tensorflow_datasets as tfds

cnn_builder = tfds.summarization.cnn_dailymail.CnnDailymail()
cnn_info = cnn_builder.info
cnn_builder.download_and_prepare()
datasets = cnn_builder.as_dataset()
train_dataset, test_dataset = datasets["train"], datasets["test"]

reviews = pd.DataFrame({'Text':train_dataset['article'] ,'Summary':train_dataset['highlights'] }) 
reviews.head()

But the output is:

TypeError Traceback (most recent call last) <ipython-input-45-2da1e32d8eec> in <module>() ----> 1 reviews = pd.DataFrame({'Text':train_ds['article'] ,'Summary':train_ds['highlights'] }) 2 reviews.head() TypeError: 'PrefetchDataset' object is not subscriptable


after I fixed the code I got this output. could you please help me to fix this issue !

b"Richard McLuckie, 48, and Stuart Mackenzie-Walker, 51, invented games.\nWon permission from Marmite owner Unilever to use its name and image.\nThen they went on investment TV show to ask for funding from the Dragons.\nBut Unilever contract said entrepreneurs couldn't mention name Marmite.\nThree Dragons pulled out, but Peter Jones and Duncan Bannatyne agreed.\nThey paid the men \xc2\xa350,000 for a 40 per cent stake in board game business."

You can use as_dataframe method.

reviews = tfds.as_dataframe(train_dataset.take(10))

Or you can iterate over the dataset to get article and highlights :

highlights = []
articles = []

for article_highlight in train_dataset.take(10):
  articles.append(article_highlight['article'].numpy())
  highlights.append(article_highlight['highlights'].numpy())

reviews = pd.DataFrame({'Text':articles ,'Summary':highlights })

In your case , note that train_dataset.take(10) will get 10 elements from the dataset.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM