None of [Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64', name='index')] are in the [index]

Question

I got this error while I am trying to run my sequential keras model.

Here is my code:

df = pd.DataFrame()
df['category'] = data['category'].rank(method='dense', ascending=False).astype(int)
df['title'] = data['title'].rank(method='dense', ascending=False).astype(int)
df['description'] = data['description']

x = df.description
y = df.category

SEED = 2000
x_train, x_validation_and_test, y_train, y_validation_and_test = train_test_split(x, y, test_size=.02, random_state=SEED)
x_validation, x_test, y_validation, y_test = train_test_split(x_validation_and_test, y_validation_and_test, test_size=.5, random_state=SEED)

And my model:

model.fit_generator(generator=batch_generator(x_train_tfidf, y_train, 32),
                        epochs=5, validation_data=(x_validation_tfidf, y_validation),
                        steps_per_epoch=x_train_tfidf.shape[0]/32)

I got this arror at:

steps_per_epoch=x_train_tfidf.shape[0]/32

df.info

 <class 'pandas.core.frame.DataFrame'> Int64Index: 994 entries, 0 to 1092 Data columns (total 3 columns): category 994 non-null int32 title 994 non-null int32 description 994 non-null object dtypes: int32(2), object(1) memory usage: 23.3+ KB

df.index

 Int64Index([ 0, 1, 2, 3, 4, 6, 7, 8, 10, 11, ... 1083, 1084, 1085, 1086, 1087, 1088, 1089, 1090, 1091, 1092], dtyp

e='int64', name='index', length=994)

EDIT: I don't understand if it's from slicing the data not properly or indexing is wrong.

Added more code:

tvec1 = TfidfVectorizer(max_features=100000,ngram_range=(1, 3))
tvec1.fit(x_train)

x_train_tfidf = tvec1.transform(x_train)
x_validation_tfidf = tvec1.transform(x_validation).toarray()

clf = LogisticRegression()
clf.fit(x_train_tfidf, y_train)

clf.score(x_validation_tfidf, y_validation)
clf.score(x_train_tfidf, y_train)

seed = 7
np.random.seed(seed)

Here is my batch_generator:

def batch_generator(X_data, y_data, batch_size):
    samples_per_epoch = X_data.shape[0]
    number_of_batches = samples_per_epoch/batch_size
    counter=0
    index = np.arange(np.shape(y_data)[0])
    while 1:
        index_batch = index[batch_size*counter:batch_size*(counter+1)]
        X_batch = X_data[index_batch,:].toarray()
        y_batch = y_data[y_data.index[index_batch]]
        counter += 1
        yield X_batch,y_batch
        if (counter > number_of_batches):
            counter=0

Answer 1

I think your problem may be that your dataframe has indexes with some values missing. That is, the length is 994 but you have indexes up to 1092 so some rows are left out. This is probably causing batch_generator to fail when indexing the dataframe.

So, before all of your provided code, try using:

data.reset_index(drop=True)

None of [Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64', name='index')] are in the [index]

Question

1 answers

solution1
-1 2020-06-06 21:25:58

None of [Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64', name='index')] are in the [index]

Question

1 answers

solution1 -1 2020-06-06 21:25:58

solution1
-1 2020-06-06 21:25:58