[Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64', name='index')] 中没有一个在 [index]

Question

I got this error while I am trying to run my sequential keras model.我在尝试运行顺序 keras model 时遇到此错误。

Here is my code:这是我的代码：

df = pd.DataFrame()
df['category'] = data['category'].rank(method='dense', ascending=False).astype(int)
df['title'] = data['title'].rank(method='dense', ascending=False).astype(int)
df['description'] = data['description']

x = df.description
y = df.category

SEED = 2000
x_train, x_validation_and_test, y_train, y_validation_and_test = train_test_split(x, y, test_size=.02, random_state=SEED)
x_validation, x_test, y_validation, y_test = train_test_split(x_validation_and_test, y_validation_and_test, test_size=.5, random_state=SEED)

And my model:还有我的 model：

model.fit_generator(generator=batch_generator(x_train_tfidf, y_train, 32),
                        epochs=5, validation_data=(x_validation_tfidf, y_validation),
                        steps_per_epoch=x_train_tfidf.shape[0]/32)

I got this arror at:我在以下位置得到了这个错误：

steps_per_epoch=x_train_tfidf.shape[0]/32

df.info df.info

 <class 'pandas.core.frame.DataFrame'> Int64Index: 994 entries, 0 to 1092 Data columns (total 3 columns): category 994 non-null int32 title 994 non-null int32 description 994 non-null object dtypes: int32(2), object(1) memory usage: 23.3+ KB

df.index df.index

 Int64Index([ 0, 1, 2, 3, 4, 6, 7, 8, 10, 11, ... 1083, 1084, 1085, 1086, 1087, 1088, 1089, 1090, 1091, 1092], dtyp

e='int64', name='index', length=994) e='int64'，名称='索引'，长度=994）

EDIT: I don't understand if it's from slicing the data not properly or indexing is wrong.编辑：我不明白是因为数据切片不正确还是索引错误。

Added more code:添加了更多代码：

tvec1 = TfidfVectorizer(max_features=100000,ngram_range=(1, 3))
tvec1.fit(x_train)

x_train_tfidf = tvec1.transform(x_train)
x_validation_tfidf = tvec1.transform(x_validation).toarray()

clf = LogisticRegression()
clf.fit(x_train_tfidf, y_train)

clf.score(x_validation_tfidf, y_validation)
clf.score(x_train_tfidf, y_train)

seed = 7
np.random.seed(seed)

Here is my batch_generator:这是我的批处理生成器：

def batch_generator(X_data, y_data, batch_size):
    samples_per_epoch = X_data.shape[0]
    number_of_batches = samples_per_epoch/batch_size
    counter=0
    index = np.arange(np.shape(y_data)[0])
    while 1:
        index_batch = index[batch_size*counter:batch_size*(counter+1)]
        X_batch = X_data[index_batch,:].toarray()
        y_batch = y_data[y_data.index[index_batch]]
        counter += 1
        yield X_batch,y_batch
        if (counter > number_of_batches):
            counter=0

Answer 1

I think your problem may be that your dataframe has indexes with some values missing.我认为您的问题可能是您的 dataframe 的索引缺少一些值。 That is, the length is 994 but you have indexes up to 1092 so some rows are left out.也就是说，长度为 994，但您的索引最多为 1092，因此有些行被遗漏了。 This is probably causing batch_generator to fail when indexing the dataframe.这可能导致在索引 dataframe 时batch_generator失败。

So, before all of your provided code, try using:因此，在您提供的所有代码之前，请尝试使用：

data.reset_index(drop=True)

[Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64', name='index')] 中没有一个在 [index]

问题描述

1 个解决方案

解决方案1
-1 2020-06-06 21:25:58

[Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64', name='index')] 中没有一个在 [index]

问题描述

1 个解决方案

解决方案1 -1 2020-06-06 21:25:58

解决方案1
-1 2020-06-06 21:25:58