[英]None of [Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64', name='index')] are in the [index]
I got this error while I am trying to run my sequential keras model.我在尝试运行顺序 keras model 时遇到此错误。
Here is my code:这是我的代码:
df = pd.DataFrame()
df['category'] = data['category'].rank(method='dense', ascending=False).astype(int)
df['title'] = data['title'].rank(method='dense', ascending=False).astype(int)
df['description'] = data['description']
x = df.description
y = df.category
SEED = 2000
x_train, x_validation_and_test, y_train, y_validation_and_test = train_test_split(x, y, test_size=.02, random_state=SEED)
x_validation, x_test, y_validation, y_test = train_test_split(x_validation_and_test, y_validation_and_test, test_size=.5, random_state=SEED)
And my model:还有我的 model:
model.fit_generator(generator=batch_generator(x_train_tfidf, y_train, 32),
epochs=5, validation_data=(x_validation_tfidf, y_validation),
steps_per_epoch=x_train_tfidf.shape[0]/32)
I got this arror at:我在以下位置得到了这个错误:
steps_per_epoch=x_train_tfidf.shape[0]/32
df.info df.info
<class 'pandas.core.frame.DataFrame'> Int64Index: 994 entries, 0 to 1092 Data columns (total 3 columns): category 994 non-null int32 title 994 non-null int32 description 994 non-null object dtypes: int32(2), object(1) memory usage: 23.3+ KB
df.index df.index
Int64Index([ 0, 1, 2, 3, 4, 6, 7, 8, 10, 11, ... 1083, 1084, 1085, 1086, 1087, 1088, 1089, 1090, 1091, 1092], dtyp
e='int64', name='index', length=994) e='int64',名称='索引',长度=994)
EDIT: I don't understand if it's from slicing the data not properly or indexing is wrong.编辑:我不明白是因为数据切片不正确还是索引错误。
Added more code:添加了更多代码:
tvec1 = TfidfVectorizer(max_features=100000,ngram_range=(1, 3))
tvec1.fit(x_train)
x_train_tfidf = tvec1.transform(x_train)
x_validation_tfidf = tvec1.transform(x_validation).toarray()
clf = LogisticRegression()
clf.fit(x_train_tfidf, y_train)
clf.score(x_validation_tfidf, y_validation)
clf.score(x_train_tfidf, y_train)
seed = 7
np.random.seed(seed)
Here is my batch_generator:这是我的批处理生成器:
def batch_generator(X_data, y_data, batch_size):
samples_per_epoch = X_data.shape[0]
number_of_batches = samples_per_epoch/batch_size
counter=0
index = np.arange(np.shape(y_data)[0])
while 1:
index_batch = index[batch_size*counter:batch_size*(counter+1)]
X_batch = X_data[index_batch,:].toarray()
y_batch = y_data[y_data.index[index_batch]]
counter += 1
yield X_batch,y_batch
if (counter > number_of_batches):
counter=0
I think your problem may be that your dataframe has indexes with some values missing.我认为您的问题可能是您的 dataframe 的索引缺少一些值。 That is, the length is 994 but you have indexes up to 1092 so some rows are left out.也就是说,长度为 994,但您的索引最多为 1092,因此有些行被遗漏了。 This is probably causing batch_generator
to fail when indexing the dataframe.这可能导致在索引 dataframe 时batch_generator
失败。
So, before all of your provided code, try using:因此,在您提供的所有代码之前,请尝试使用:
data.reset_index(drop=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.