[Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64', name='index')] 中没有一个在 [index]

Question

我在尝试运行顺序 keras model 时遇到此错误。

这是我的代码：

df = pd.DataFrame()
df['category'] = data['category'].rank(method='dense', ascending=False).astype(int)
df['title'] = data['title'].rank(method='dense', ascending=False).astype(int)
df['description'] = data['description']

x = df.description
y = df.category

SEED = 2000
x_train, x_validation_and_test, y_train, y_validation_and_test = train_test_split(x, y, test_size=.02, random_state=SEED)
x_validation, x_test, y_validation, y_test = train_test_split(x_validation_and_test, y_validation_and_test, test_size=.5, random_state=SEED)

还有我的 model：

model.fit_generator(generator=batch_generator(x_train_tfidf, y_train, 32),
                        epochs=5, validation_data=(x_validation_tfidf, y_validation),
                        steps_per_epoch=x_train_tfidf.shape[0]/32)

我在以下位置得到了这个错误：

steps_per_epoch=x_train_tfidf.shape[0]/32

df.info

 <class 'pandas.core.frame.DataFrame'> Int64Index: 994 entries, 0 to 1092 Data columns (total 3 columns): category 994 non-null int32 title 994 non-null int32 description 994 non-null object dtypes: int32(2), object(1) memory usage: 23.3+ KB

df.index

 Int64Index([ 0, 1, 2, 3, 4, 6, 7, 8, 10, 11, ... 1083, 1084, 1085, 1086, 1087, 1088, 1089, 1090, 1091, 1092], dtyp

e='int64'，名称='索引'，长度=994）

编辑：我不明白是因为数据切片不正确还是索引错误。

添加了更多代码：

tvec1 = TfidfVectorizer(max_features=100000,ngram_range=(1, 3))
tvec1.fit(x_train)

x_train_tfidf = tvec1.transform(x_train)
x_validation_tfidf = tvec1.transform(x_validation).toarray()

clf = LogisticRegression()
clf.fit(x_train_tfidf, y_train)

clf.score(x_validation_tfidf, y_validation)
clf.score(x_train_tfidf, y_train)

seed = 7
np.random.seed(seed)

这是我的批处理生成器：

def batch_generator(X_data, y_data, batch_size):
    samples_per_epoch = X_data.shape[0]
    number_of_batches = samples_per_epoch/batch_size
    counter=0
    index = np.arange(np.shape(y_data)[0])
    while 1:
        index_batch = index[batch_size*counter:batch_size*(counter+1)]
        X_batch = X_data[index_batch,:].toarray()
        y_batch = y_data[y_data.index[index_batch]]
        counter += 1
        yield X_batch,y_batch
        if (counter > number_of_batches):
            counter=0

Answer 1

我认为您的问题可能是您的 dataframe 的索引缺少一些值。 也就是说，长度为 994，但您的索引最多为 1092，因此有些行被遗漏了。 这可能导致在索引 dataframe 时batch_generator失败。

因此，在您提供的所有代码之前，请尝试使用：

data.reset_index(drop=True)

[Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64', name='index')] 中没有一个在 [index]

问题描述

1 个解决方案

解决方案1
-1 2020-06-06 21:25:58

[Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64', name='index')] 中没有一个在 [index]

问题描述

1 个解决方案

解决方案1 -1 2020-06-06 21:25:58

解决方案1
-1 2020-06-06 21:25:58