简体   繁体   English

使用 Keras 的 CSV 文件数据集增强

[英]CSV File Dataset Augmentation using Keras

I am working on an already implemented project in Kaggle which has to do with Image Classification .我正在Kaggle 中开展一个已经实施的项目,该项目与图像分类有关 I have 6 classes to predict on in total, which are Angry, Happy, Sad etc. I have implemented a CNN model and I am currently using only 4 classes(the ones with highest number of images), but my model is overfitting, my validation accuracy is going 53% at maximum, therefore I have tried several things but not seemingly improving my accuracy.我总共有 6 个类要预测,分别是 Angry、Happy、Sad 等。我已经实现了一个 CNN 模型,目前我只使用了 4 个类(图像数量最多的那些),但是我的模型过度拟合,我的验证准确率最高为 53%,因此我尝试了几件事,但似乎没有提高我的准确率。 Now I saw people mentioning something called Data Augmentation and thought to give it a go as it seems a potential to increase the accuracy.现在我看到人们提到了一种叫做数据增强的东西,并想试一试,因为它似乎有提高准确性的潜力。 However I am stuck with an error which I cannot figure out.但是,我遇到了一个我无法弄清楚的错误。

Distribution of dataset:数据集分布:

6_classes

from tensorflow.keras.preprocessing.image import ImageDataGenerator
from matplotlib.pyplot import imread, imshow, subplots, show


def plot(data_generator):
    """
    Plots 4 images generated by an object of the ImageDataGenerator class.
    """
    data_generator.fit(df_training)
    image_iterator = data_generator.flow(df_training)

    # Plot the images given by the iterator
    fig, rows = subplots(nrows=1, ncols=4, figsize=(18,18))
    for row in rows:
        row.imshow(image_iterator.next()[0].astype('int'))
        row.axis('off')
    show()

x_train = df_training.drop("emotion",axis=1)
image = x_train[1:2].values.reshape(48, 48)
x_train = x_train.values.reshape(x_train.shape[0], 48, 48,1)
x_train = x_train.astype("float32")
image = image.astype("float32")
image = x_train[1:2].reshape(48, 48)

# Creating a dataset which contains just one image.
images = image.reshape((1, image.shape[0], image.shape[1]))

imshow(images[0])
show()
print(x_train.shape)
data_generator = ImageDataGenerator(rotation_range=90)
plot(data_generator)

Error:错误:

ValueError: Input to .fit() should have rank 4. Got array with shape: (28709, 2305) ValueError: .fit()输入应该有 4 级。得到了形状的数组:(28709, 2305)

I have already reshaped my data into a 4d array but for some reason in the error it appears as my data is 2d.我已经将我的数据改造成一个 4d 数组,但由于某种原因,在错误中它显示为我的数据是 2d。 This is the shape of print(x_train.shape) => (28709, 48, 48, 1)这是print(x_train.shape) => (28709, 48, 48, 1) 的形状

x_train is where the dataset is, x_train[1:2] accessing one image. x_train是数据集所在的位置,x_train[1:2] 访问一张图像。

Ps Is there any other approach that you would recommend to improve my accuracy according to this dataset. Ps 根据此数据集,您是否可以推荐其他任何方法来提高我的准确性。 For further questions about my dataset please let me know if you don't understand something in this partial code.有关我的数据集的更多问题,如果您不理解此部分代码中的某些内容,请告诉我。

You use your data_generator on df_training and not on x_train.您在 df_training 而不是 x_train 上使用 data_generator。

As for more ideas about how to avoid overfitting: Tensorflow has an official tutorial on that with some good suggestions:https://www.tensorflow.org/tutorials/keras/overfit_and_underfit关于如何避免过度拟合的更多想法:Tensorflow 有一个官方教程,其中有一些很好的建议:https ://www.tensorflow.org/tutorials/keras/overfit_and_underfit

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM