繁体   English   中英

[Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64', name='index')] 中没有一个在 [index]

[英]None of [Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64', name='index')] are in the [index]

我在尝试运行顺序 keras model 时遇到此错误。

这是我的代码:

df = pd.DataFrame()
df['category'] = data['category'].rank(method='dense', ascending=False).astype(int)
df['title'] = data['title'].rank(method='dense', ascending=False).astype(int)
df['description'] = data['description']

x = df.description
y = df.category

SEED = 2000
x_train, x_validation_and_test, y_train, y_validation_and_test = train_test_split(x, y, test_size=.02, random_state=SEED)
x_validation, x_test, y_validation, y_test = train_test_split(x_validation_and_test, y_validation_and_test, test_size=.5, random_state=SEED)

还有我的 model:

model.fit_generator(generator=batch_generator(x_train_tfidf, y_train, 32),
                        epochs=5, validation_data=(x_validation_tfidf, y_validation),
                        steps_per_epoch=x_train_tfidf.shape[0]/32)

我在以下位置得到了这个错误:

steps_per_epoch=x_train_tfidf.shape[0]/32

df.info

 <class 'pandas.core.frame.DataFrame'> Int64Index: 994 entries, 0 to 1092 Data columns (total 3 columns): category 994 non-null int32 title 994 non-null int32 description 994 non-null object dtypes: int32(2), object(1) memory usage: 23.3+ KB

df.index

 Int64Index([ 0, 1, 2, 3, 4, 6, 7, 8, 10, 11, ... 1083, 1084, 1085, 1086, 1087, 1088, 1089, 1090, 1091, 1092], dtyp

e='int64',名称='索引',长度=994)

编辑:我不明白是因为数据切片不正确还是索引错误。

添加了更多代码:

tvec1 = TfidfVectorizer(max_features=100000,ngram_range=(1, 3))
tvec1.fit(x_train)

x_train_tfidf = tvec1.transform(x_train)
x_validation_tfidf = tvec1.transform(x_validation).toarray()

clf = LogisticRegression()
clf.fit(x_train_tfidf, y_train)

clf.score(x_validation_tfidf, y_validation)
clf.score(x_train_tfidf, y_train)

seed = 7
np.random.seed(seed)

这是我的批处理生成器:

def batch_generator(X_data, y_data, batch_size):
    samples_per_epoch = X_data.shape[0]
    number_of_batches = samples_per_epoch/batch_size
    counter=0
    index = np.arange(np.shape(y_data)[0])
    while 1:
        index_batch = index[batch_size*counter:batch_size*(counter+1)]
        X_batch = X_data[index_batch,:].toarray()
        y_batch = y_data[y_data.index[index_batch]]
        counter += 1
        yield X_batch,y_batch
        if (counter > number_of_batches):
            counter=0

我认为您的问题可能是您的 dataframe 的索引缺少一些值。 也就是说,长度为 994,但您的索引最多为 1092,因此有些行被遗漏了。 这可能导致在索引 dataframe 时batch_generator失败。

因此,在您提供的所有代码之前,请尝试使用:

data.reset_index(drop=True)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM