簡體   English   中英

Keras - CNN - 調整數據集 - 刪除有偏見的 class 和數據增強嘗試

[英]Keras - CNN - adjust dataset - remove biased class and data augmentation attempt

我對基於流行的皮膚癌圖像數據集開發的 model 有一個困境。 我必須指出我想要一些指導 -

一個。

原始數據集包含超過 10K 圖像,其中近 7000 張圖像屬於七個類別之一。 我創建了 4948 個隨機圖像的子集,我使用它運行 function 將圖像轉換為列表列表 - 第一個列表包含圖像,后者包含 class 以及關閉 ZA2F2ED4F8EBC2CBBD4C21 的任何圖像( class 與 +6800K 圖像)。 思考過程是規范跨類的分布。

使用 output(6 個神經元而不是 7 個神經元的密集層)重新運行原始 model - 檢索錯誤。

我是否錯過了向 model “指示”只有六個可能的類的步驟? model 僅在 output 層有七個神經元時運行。

錯誤:

Train on 1245 samples, validate on 312 samples
Epoch 1/30
---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-138-8a3b40a69e37> in <module>
     25              metrics=["accuracy"])
     26 
---> 27 model.fit(X_train, y_train, batch_size=32, epochs=30, validation_split=0.2)

/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
    778           validation_steps=validation_steps,
    779           validation_freq=validation_freq,
--> 780           steps_name='steps_per_epoch')
    781 
    782   def evaluate(self,

/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/engine/training_arrays.py in model_iteration(model, inputs, targets, sample_weights, batch_size, epochs, verbose, callbacks, val_inputs, val_targets, val_sample_weights, shuffle, initial_epoch, steps_per_epoch, validation_steps, validation_freq, mode, validation_in_fit, prepared_feed_values_from_dataset, steps_name, **kwargs)
    361 
    362         # Get outputs.
--> 363         batch_outs = f(ins_batch)
    364         if not isinstance(batch_outs, list):
    365           batch_outs = [batch_outs]

/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/backend.py in __call__(self, inputs)
   3290 
   3291     fetched = self._callable_fn(*array_vals,
-> 3292                                 run_metadata=self.run_metadata)
   3293     self._call_fetch_callbacks(fetched[-len(self._fetches):])
   3294     output_structure = nest.pack_sequence_as(

/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py in __call__(self, *args, **kwargs)
   1456         ret = tf_session.TF_SessionRunCallable(self._session._session,
   1457                                                self._handle, args,
-> 1458                                                run_metadata_ptr)
   1459         if run_metadata:
   1460           proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

InvalidArgumentError: Received a label value of 6 which is outside the valid range of [0, 6).  Label values: 1 1 2 4 2 1 2 1 2 1 2 2 4 2 2 1 3 1 4 6 0 2 4 2 0 4 2 4 4 0 2 4
     [[{{node loss_15/activation_63_loss/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]]

B.

我正在嘗試添加數據增強,因為考慮到類的數量和跨類圖像的稀疏性,數據集相對較小。 一旦我嘗試運行生成器,我就會收到下面的錯誤消息,表明validation_data元組中的一個變量有問題。 我無法理解問題所在。

測試集的示例值如下所示:

[[[[0.41568627]
   [0.4       ]
   [0.43137255]
   ...
   [0.54509804]
   [0.54901961]
   [0.54509804]]

  [[0.42352941]
   [0.43137255]
   [0.43921569]
   ...
   [0.56078431]
   [0.54117647]
   [0.55294118]]

  [[0.41960784]
   [0.41960784]
   [0.45490196]
   ...
   [0.51764706]
   [0.57254902]
   [0.50588235]]

  ...

  [[0.30980392]
   [0.36470588]
   [0.36470588]
   ...
   [0.47058824]
   [0.44705882]
   [0.41960784]]

  [[0.29803922]
   [0.31764706]
   [0.34509804]
   ...
   [0.45098039]
   [0.43921569]
   [0.4       ]]

  [[0.25882353]
   [0.30196078]
   [0.31764706]
   ...
   [0.45490196]
   [0.42745098]
   [0.36078431]]]


 [[[0.60784314]
   [0.59215686]
   [0.56862745]
   ...
   [0.59607843]
   [0.63921569]
   [0.63529412]]

  [[0.6627451 ]
   [0.63137255]
   [0.62352941]
   ...
   [0.67843137]
   [0.60784314]
   [0.63529412]]

  [[0.62745098]
   [0.65098039]
   [0.6       ]
   ...
   [0.61568627]
   [0.63921569]
   [0.67058824]]

  ...

  [[0.62352941]
   [0.6       ]
   [0.59607843]
   ...
   [0.6627451 ]
   [0.71372549]
   [0.6745098 ]]

  [[0.61568627]
   [0.58431373]
   [0.61568627]
   ...
   [0.67058824]
   [0.65882353]
   [0.68235294]]

  [[0.61176471]
   [0.60392157]
   [0.61960784]
   ...
   [0.65490196]
   [0.6627451 ]
   [0.66666667]]]]

[2, 1, 4, 4, 2]

錯誤:

Epoch 1/10
  1/155 [..............................] - ETA: 11s - loss: 1.7916 - acc: 0.3000
---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-139-8f19a958861f> in <module>
     12 history = model.fit_generator(trainAug.flow(X_train, y_train, batch_size=batch_size)                           
     13                              ,epochs = 10, validation_data = (X_test, y_test),
---> 14                               steps_per_epoch= X_train.shape[0]// batch_size
     15                              )
     16 #epochs = epochs, validation_data = (X_test, y_test),

/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
   1431         shuffle=shuffle,
   1432         initial_epoch=initial_epoch,
-> 1433         steps_name='steps_per_epoch')
   1434 
   1435   def evaluate_generator(self,

/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/engine/training_generator.py in model_iteration(model, data, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch, mode, batch_size, steps_name, **kwargs)
    262 
    263       is_deferred = not model._is_compiled
--> 264       batch_outs = batch_function(*batch_data)
    265       if not isinstance(batch_outs, list):
    266         batch_outs = [batch_outs]

/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py in train_on_batch(self, x, y, sample_weight, class_weight, reset_metrics)
   1173       self._update_sample_weight_modes(sample_weights=sample_weights)
   1174       self._make_train_function()
-> 1175       outputs = self.train_function(ins)  # pylint: disable=not-callable
   1176 
   1177     if reset_metrics:

/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/backend.py in __call__(self, inputs)
   3290 
   3291     fetched = self._callable_fn(*array_vals,
-> 3292                                 run_metadata=self.run_metadata)
   3293     self._call_fetch_callbacks(fetched[-len(self._fetches):])
   3294     output_structure = nest.pack_sequence_as(

/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py in __call__(self, *args, **kwargs)
   1456         ret = tf_session.TF_SessionRunCallable(self._session._session,
   1457                                                self._handle, args,
-> 1458                                                run_metadata_ptr)
   1459         if run_metadata:
   1460           proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

InvalidArgumentError: Received a label value of 6 which is outside the valid range of [0, 6).  Label values: 0 1 6 4 2 4 2 0 1 2
     [[{{node loss_15/activation_63_loss/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]]

代碼:

import tensorflow as tf
        from tensorflow.keras.models import Sequential
        from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten, Conv2D, MaxPooling2D

        import numpy as np
        import pandas as pd

        import matplotlib.pyplot as plt

        import sys
        import os
        import cv2

        DATA_DIR = "/Users/namefolder/PycharmProjects/skin-cancer/HAM10000_images_part_1"

        metadata = pd.read_csv(os.path.join(DATA_DIR, 'HAM10000_metadata.csv'))

        lesion_type_dict = {'nv': 'Melanocytic nevi',
            'mel': 'Melanoma',
            'bkl': 'Benign keratosis-like lesions ',
            'bcc': 'Basal cell carcinoma',
            'akiec': 'Actinic keratoses',
            'vasc': 'Vascular lesions',
            'df': 'Dermatofibroma'}

        metadata['cell_type'] = metadata['dx'].map(lesion_type_dict.get)
        metadata['dx_code'] = pd.Categorical(metadata['dx']).codes

        # save array of image-id and diagnosis-type (categorical)
        metadata = metadata[['image_id', 'dx', 'dx_type', 'dx_code']]

        training_data = []

        IMG_SIZE=50

        # preparing training data

        def creating_training_data(path):
            for img in os.listdir(path):
                try:
                    img_array = cv2.imread(os.path.join(path, img), cv2.IMREAD_GRAYSCALE)
                    new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE))
                    for index, row in metadata.iterrows():
                        if (img == row['image_id']+'.jpg') & (row['dx_code'] != 5):
                            try:
                                training_data.append([new_array, row['dx_code']])
                            except Exception as ee:
                                pass
                except Exception as e:
                    pass

            return training_data

        training_data = creating_training_data(DATA_DIR)

        import random

        random.shuffle(training_data)

        # Splitting data into X features and Y label
        X_train = []
        y_train = []
        for features, label in training_data:
            X_train.append(features)
            y_train.append(label)

        # Reshaping of the data - required by Tensorflow and Keras (*necessary step of deep-learning using these repos)
        X_train = np.array(X_train).reshape(-1, IMG_SIZE, IMG_SIZE, 1)

        # Normalize data - to reduce processing requirements
        X_train = X_train/255.0

        # model configuration
        model = Sequential()
        model.add(Conv2D(64, (3,3), input_shape = X_train.shape[1:]))
        model.add(Activation("relu"))
        model.add(MaxPooling2D(pool_size=(2, 2)))

        model.add(Conv2D(64, (3,3)))
        model.add(Activation("relu"))
        model.add(MaxPooling2D(pool_size=(2, 2)))

        model.add(Flatten())
        model.add(Dense(64))

        model.add(Dense(6))
        model.add(Activation("softmax"))

        model.compile(loss="mean_squared_error",
                     optimizer="adam",
                     metrics=["accuracy"])

    # Data Augmentation - Repo enabler
    from keras.preprocessing.image import ImageDataGenerator
    from keras.callbacks import ReduceLROnPlateau

    # initialize the training training data augmentation object
    trainAug = ImageDataGenerator(
        rescale=1 / 255.0,
        rotation_range=20,
        zoom_range=0.05,
        width_shift_range=0.05,
        height_shift_range=0.05,
        shear_range=0.05,
        horizontal_flip=True,
        fill_mode="nearest")

    # initialize the validation (and testing) data augmentation object
    valAug = ImageDataGenerator(rescale=1 / 255.0)

    #set a leraning rate annealer
    learning_rate_reduction = ReduceLROnPlateau(monitor='val_acc',
                                               patience=3,
                                               verbose=1,
                                               factor=0.5,
                                               min_lr=0.00001)

    #Augmented Images model development
   )

    trainAug.fit(X_train)

    #Fit the model
    epochs = 10
    batch_size= 10

    history = model.fit_generator(trainAug.flow(X_train, y_train, batch_size=batch_size),epochs = 10, validation_data = (X_test, y_test), steps_per_epoch= X_train.shape[0]// batch_size)

最初您有 7 個標簽:然后您的代碼需要標簽 0、1、2、3、4、5、6

您從數據集中刪除了 label 5 ,好的。 現在您總共有 6 個標簽。
您的代碼期望:0、1、2、3、4、5

但是您的數據中有:0、1、2、3、4、6

去掉 label 5 后,需要將標簽 6 轉換為 5。


類似的東西:

if (img == row['image_id']+'.jpg') & (row['dx_code'] > 5):
    try:
        training_data.append([new_array, row['dx_code'] - 1])
    except Exception as ee:
        pass
elif (img == row['image_id']+'.jpg') & (row['dx_code'] < 5):
    try:
        training_data.append([new_array, row['dx_code']])
    except Exception as ee:
        pass

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM