简体   繁体   中英

keras fit_generator InvalidArgumentError on last step_per_epoch

I'm working with an imbalanced dataset with 2 classes: 0,1. I've built a batch_generator function that will ensure each class is in a batch, this way I can get the AUC. This works fine until I get to the last step in "steps_per_epoch" and returns an

InvalidArgumentError: ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.

I can confirm that my y_test (validation) includes both classes. I've tried multiple "steps_per_epoch" and the error always occurs at the last step. For instance if steps_per_epoch = 5, error occurs at step 4. My code for my batch generator is below. Any ideas around this??

def batch_generator(train, x_train,batch_size):   
    '''
     This function will return batches that include at least one of each class (there are 2 classes)

     train = train set dataframe
     x_train = features (array) that have been tokenized and padded
     batch_size = number of samples per batch
    '''

    class0_index = train[train.CLASS!=1].index
    x_class0 = x_train[class0_index]
    y_class0 = train[train.CLASS.index.isin(class0_index)].CLASS.values
    class0_size = math.floor(batch_size*.99)

    class1_index = train[train.CLASS==1].index
    x_class1 = x_train[class1_index]
    y_class1 = train[train.CLASS.index.isin(class1_index)].CLASS.values
    class1_size= math.floor(batch_size*.01)

    while True:


            # deal with class 0

            class0_index = train[train.CLASS!=1].index
            x_class0 = x_train[class0_index]
            y_class0 = train[train.CLASS.index.isin(class0_index)].CLASS.values

            class0_size = math.floor(batch_size*.99)

            class0_batch_index = np.random.choice(range(x_class0.shape[0]), size=class0_size)

            x_BATCH_class0 = x_class0[class0_batch_index]
            y_BATCH_class0 = y_class0[class0_batch_index]
            y_BATCH_class0 = y_BATCH_class0.reshape(class0_size,1)

            BATCH_class0 = np.hstack((x_BATCH_class0, y_BATCH_class0))




            # deal with class 1

            class1_batch_index = np.random.choice(range(x_class1.shape[0]), size= class1_size)

            x_BATCH_class1 = x_class1[class1_batch_index]

            y_BATCH_class1 = y_class1[class1_batch_index]
            y_BATCH_class1 = y_BATCH_class1.reshape(class1_size,1)

            BATCH_class1 = np.hstack((x_BATCH_class1, y_BATCH_class1))

            # putting them together

            BATCH = np.vstack((BATCH_class0, BATCH_class1))

            np.random.shuffle(BATCH)

            x_BATCH = BATCH[:,:-1]
            y_BATCH = BATCH[:,-1:]

            yield x_BATCH, y_BATCH


batch_size= 2000
num_batches = 10
epochs = 5
model.fit_generator(batch_generator(train, x_train, batch_size = batch_size), epochs= epochs, validation_data=(x_test, y_test), steps_per_epoch=num_batches)

Epoch 1/5 9/10 [==========================>...] - ETA: 1s - loss: 0.0784 - auc: 0.4790

InvalidArgumentError                      Traceback (most recent call last)
c:\programdata\anaconda3\lib\site-packages\tensorflow\python\client\session.py in _do_call(self, fn, *args)
   1322     try:
-> 1323       return fn(*args)
   1324     except errors.OpError as e:

c:\programdata\anaconda3\lib\site-packages\tensorflow\python\client\session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
   1301                                    feed_dict, fetch_list, target_list,
-> 1302                                    status, run_metadata)
   1303 

c:\programdata\anaconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
    472             compat.as_text(c_api.TF_Message(self.status.status)),
--> 473             c_api.TF_GetCode(self.status.status))
    474     # Delete the underlying status object from memory otherwise it stays alive

InvalidArgumentError: ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.
     [[Node: metrics/auc/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT], Tout=[DT_DOUBLE], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_dense_1_target_0_1, dense_1/Sigmoid/_93)]]
     [[Node: metrics/auc/PyFunc/_123 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_211_metrics/auc/PyFunc", tensor_type=DT_DOUBLE, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

It has nothing to do with steps_per_epoch . As the error message says, the auc score requires at least two classes to be calculated. I guess your validation data has only one class present in y_test. Check test split with np.mean(y_test) and you will probably get a 0 or 1, while it should be between 0 and 1.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM