![](/img/trans.png)
[英]KeyError: "None of [Int64Index([ 12313,\n , 34534],\n dtype='int64', leng
[英]None of [Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64', name='index')] are in the [index]
我在尝试运行顺序 keras model 时遇到此错误。
这是我的代码:
df = pd.DataFrame()
df['category'] = data['category'].rank(method='dense', ascending=False).astype(int)
df['title'] = data['title'].rank(method='dense', ascending=False).astype(int)
df['description'] = data['description']
x = df.description
y = df.category
SEED = 2000
x_train, x_validation_and_test, y_train, y_validation_and_test = train_test_split(x, y, test_size=.02, random_state=SEED)
x_validation, x_test, y_validation, y_test = train_test_split(x_validation_and_test, y_validation_and_test, test_size=.5, random_state=SEED)
还有我的 model:
model.fit_generator(generator=batch_generator(x_train_tfidf, y_train, 32),
epochs=5, validation_data=(x_validation_tfidf, y_validation),
steps_per_epoch=x_train_tfidf.shape[0]/32)
我在以下位置得到了这个错误:
steps_per_epoch=x_train_tfidf.shape[0]/32
df.info
<class 'pandas.core.frame.DataFrame'> Int64Index: 994 entries, 0 to 1092 Data columns (total 3 columns): category 994 non-null int32 title 994 non-null int32 description 994 non-null object dtypes: int32(2), object(1) memory usage: 23.3+ KB
df.index
Int64Index([ 0, 1, 2, 3, 4, 6, 7, 8, 10, 11, ... 1083, 1084, 1085, 1086, 1087, 1088, 1089, 1090, 1091, 1092], dtyp
e='int64',名称='索引',长度=994)
编辑:我不明白是因为数据切片不正确还是索引错误。
添加了更多代码:
tvec1 = TfidfVectorizer(max_features=100000,ngram_range=(1, 3))
tvec1.fit(x_train)
x_train_tfidf = tvec1.transform(x_train)
x_validation_tfidf = tvec1.transform(x_validation).toarray()
clf = LogisticRegression()
clf.fit(x_train_tfidf, y_train)
clf.score(x_validation_tfidf, y_validation)
clf.score(x_train_tfidf, y_train)
seed = 7
np.random.seed(seed)
这是我的批处理生成器:
def batch_generator(X_data, y_data, batch_size):
samples_per_epoch = X_data.shape[0]
number_of_batches = samples_per_epoch/batch_size
counter=0
index = np.arange(np.shape(y_data)[0])
while 1:
index_batch = index[batch_size*counter:batch_size*(counter+1)]
X_batch = X_data[index_batch,:].toarray()
y_batch = y_data[y_data.index[index_batch]]
counter += 1
yield X_batch,y_batch
if (counter > number_of_batches):
counter=0
我认为您的问题可能是您的 dataframe 的索引缺少一些值。 也就是说,长度为 994,但您的索引最多为 1092,因此有些行被遗漏了。 这可能导致在索引 dataframe 时batch_generator
失败。
因此,在您提供的所有代码之前,请尝试使用:
data.reset_index(drop=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.