简体   繁体   中英

Text Classification with RNN in TensorFlow - AttributeError: '_IndicatorColumn' object has no attribute 'key'

I'm attempting to make a program that will classify blocks of text into classes using DynamicRnnEstimator in TensorFlow. Unfortunately I'm receiving this error when I run my code:

AttributeError: '_IndicatorColumn' object has no attribute 'key'

My data.csv file looks like this:

在此处输入图片说明

Here's the code I'm currently running in Python 3:

import tensorflow as tf
import pandas as pd
from sklearn import preprocessing
from gensim import corpora
from nltk.tokenize import WhitespaceTokenizer
import pandas as pd
import string
from tensorflow.contrib.learn.python.learn.estimators import constants
from tensorflow.contrib.learn.python.learn.estimators import rnn_common


data_df = pd.read_csv('data.csv', encoding='ISO-8859-1').astype('U') #data.csv has 2 columns: "Category", and "Description"

raw_descriptions = data_df['Description']

## Calculate vocab size
descriptions = []
for description in raw_descriptions:
    descriptions.append(WhitespaceTokenizer().tokenize(description))

dictionary = corpora.Dictionary(descriptions)
unique_words = len(dictionary.token2id) #how many unique words do we see? use for hash_bucket_size

## Set up Features and Labels
features = raw_descriptions.to_frame() #pandas_input_func needs features in DataFrame format
lab_enc = preprocessing.LabelEncoder()
labels = lab_enc.fit_transform(data_df['Category'])
labels = pd.Series(labels) #pandas_input_func needs the labels in Series format

## Train/Test Split
split = int(.3*len(data_df.index)) #we'll use 30% of our data for testing, 70% for training
features_train = features[:-split]
features_test = features[-split:]
labels_train = labels[:-split]
labels_test = labels[-split:]


n_classes = len(lab_enc.classes_) #how many unique lables do we have?

categorical_column = tf.feature_column.categorical_column_with_hash_bucket('Description', hash_bucket_size=unique_words)
description = tf.feature_column.indicator_column(categorical_column)
feat_cols = [description]

input_func = tf.estimator.inputs.pandas_input_fn(
    x=features_train, 
    y=labels_train, 
    batch_size=100, 
    num_epochs=None, 
    shuffle=False)


classifier = tf.contrib.learn.DynamicRnnEstimator(
    problem_type = constants.ProblemType.CLASSIFICATION,
    prediction_type = rnn_common.PredictionType.SINGLE_VALUE,
    sequence_feature_columns = feat_cols,
    context_feature_columns = None,
    num_units = 5,
    num_classes = n_classes,
    cell_type = 'lstm', 
    optimizer = 'SGD',
    learning_rate = 0.1,
    predict_probabilities = True)

classifier.fit(input_fn=input_func)

The error occurs on the last line, at classifier.fit(). I'm not too sure how to approach this issue. I assume that since the attribute 'key' is being called, that something needs to be formatted as a dict, but I'm not sure what, or why.

Any insight is greatly appreciated!

It looks like your features are set up in a pandas df, not a dictionary. In the following code, for example, you see how the expected behaviour is with panda's DataFrames in TensorFlow:

# Convert pandas data into a dict of np arrays.
features = {key:np.array(value) for key,value in dict(features).items()}          

Give this a try, I am confident it will fix your problem. In the error you provided, you can see that see that the internal code is trying to reach a "key" attribute (used in dictionaries, but not in DataFrames).

Alex

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM