簡體   English   中英

TensorFlow中具有RNN的文本分類-AttributeError:'_IndicatorColumn'對象沒有屬性'key'

[英]Text Classification with RNN in TensorFlow - AttributeError: '_IndicatorColumn' object has no attribute 'key'

我正在嘗試制作一個使用TensorFlow中的DynamicRnnEstimator將文本塊分類為類的程序。 不幸的是,我在運行代碼時收到此錯誤:

AttributeError:'_IndicatorColumn'對象沒有屬性'key'

我的data.csv文件如下所示:

在此處輸入圖片說明

這是我當前在Python 3中運行的代碼:

import tensorflow as tf
import pandas as pd
from sklearn import preprocessing
from gensim import corpora
from nltk.tokenize import WhitespaceTokenizer
import pandas as pd
import string
from tensorflow.contrib.learn.python.learn.estimators import constants
from tensorflow.contrib.learn.python.learn.estimators import rnn_common


data_df = pd.read_csv('data.csv', encoding='ISO-8859-1').astype('U') #data.csv has 2 columns: "Category", and "Description"

raw_descriptions = data_df['Description']

## Calculate vocab size
descriptions = []
for description in raw_descriptions:
    descriptions.append(WhitespaceTokenizer().tokenize(description))

dictionary = corpora.Dictionary(descriptions)
unique_words = len(dictionary.token2id) #how many unique words do we see? use for hash_bucket_size

## Set up Features and Labels
features = raw_descriptions.to_frame() #pandas_input_func needs features in DataFrame format
lab_enc = preprocessing.LabelEncoder()
labels = lab_enc.fit_transform(data_df['Category'])
labels = pd.Series(labels) #pandas_input_func needs the labels in Series format

## Train/Test Split
split = int(.3*len(data_df.index)) #we'll use 30% of our data for testing, 70% for training
features_train = features[:-split]
features_test = features[-split:]
labels_train = labels[:-split]
labels_test = labels[-split:]


n_classes = len(lab_enc.classes_) #how many unique lables do we have?

categorical_column = tf.feature_column.categorical_column_with_hash_bucket('Description', hash_bucket_size=unique_words)
description = tf.feature_column.indicator_column(categorical_column)
feat_cols = [description]

input_func = tf.estimator.inputs.pandas_input_fn(
    x=features_train, 
    y=labels_train, 
    batch_size=100, 
    num_epochs=None, 
    shuffle=False)


classifier = tf.contrib.learn.DynamicRnnEstimator(
    problem_type = constants.ProblemType.CLASSIFICATION,
    prediction_type = rnn_common.PredictionType.SINGLE_VALUE,
    sequence_feature_columns = feat_cols,
    context_feature_columns = None,
    num_units = 5,
    num_classes = n_classes,
    cell_type = 'lstm', 
    optimizer = 'SGD',
    learning_rate = 0.1,
    predict_probabilities = True)

classifier.fit(input_fn=input_func)

錯誤發生在最后一行classifier.fit()。 我不太確定如何解決這個問題。 我假設由於調用了屬性“鍵”,因此需要將某些內容格式化為字典,但是我不確定是什么或為什么。

非常感謝任何見識!

看來您的功能是在pandas df中設置的,而不是在詞典中設置的。 例如,在下面的代碼中,您將看到TensorFlow中熊貓的DataFrames的預期行為:

# Convert pandas data into a dict of np arrays.
features = {key:np.array(value) for key,value in dict(features).items()}          

試試看,我相信它將解決您的問題。 在您提供的錯誤中,您可以看到內部代碼正在嘗試到達“鍵”屬性(在字典中使用,但不在DataFrames中使用)。

亞歷克斯

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM