简体   繁体   English

Keras flow_from_dataframe 提供 0 张图片

[英]Keras flow_from_dataframe gives 0 images

I am trying to use the flow_from_dataframe method of Keras to read training and testing images.我正在尝试使用 Keras 的flow_from_dataframe方法来读取训练和测试图像。

Both my training and testing images are in same directory, and I read the paths from two different csv files.我的训练和测试图像都在同一个目录中,我从两个不同的 csv 文件中读取了路径。

My code for reading test images looks like,我的读取测试图像的代码看起来像,

# Read test file
testdf = pd.read_csv("test.csv")

# load images
test_datagen = ImageDataGenerator(rescale=1./255)

test_generator = test_datagen.flow_from_dataframe(
    dataframe=testdf, directory=IMAGE_PATH,
    x_col='image_name', y_col=None,
    has_ext=True, target_size=(10,10)
    ,batch_size=32,color_mode='rgb',shuffle=False, class_mode=None)

I get output like this我得到这样的输出

Found 0 images.

While the similar code for reading training data works properly.虽然用于读取训练数据的类似代码可以正常工作。 I checked if the images exist at the given path, which they do.我检查了图像是否存在于给定的路径中,它们确实存在。 What are some possible reasons for this error?此错误的一些可能原因是什么? How can I try to debug the issue?如何尝试调试问题?

EDIT: This is a regression task, so all images are in a single directory, and not in subdirectories, as would be expected for a classification task.编辑:这是一个回归任务,所以所有图像都在一个目录中,而不是在子目录中,正如分类任务所期望的那样。

EDIT 2: I added usecols=[0] to read_csv, and now test_datagen finds all the images in the directory, and not just the one's that are mentioned in the test.csv file编辑 2:我将usecols=[0]添加到 read_csv,现在 test_datagen 找到目录中的所有图像,而不仅仅是 test.csv 文件中提到的图像

The issue happens due to NaN's in the dataframe.该问题是由于数据框中的 NaN 引起的。 Ignoring those columns doesn't work.忽略这些列是行不通的。 The solution is to replace the NaN's with something else.解决方案是用其他东西替换 NaN。 For example,例如,

testdf = pd.read_csv("test.csv")
testdf.fillna(0, inplace=True)

This replaces the NaN's with 0. Then using ImageDataGenerator as usual works.这将 NaN 替换为 0。然后像往常一样使用ImageDataGenerator

I was also facing the same error and found a solution for this.我也面临同样的错误并找到了解决方案。 I was using the absolute path, was using correct DataFrame and everything was fine still the code was throwing an error - "image not found".我使用的是绝对路径,使用了正确的 DataFrame,一切都很好,但代码仍然抛出错误 - “找不到图像”。

I inspected and found that my dataframe was containing image names without extension and the images in the folder was having extension also.我检查并发现我的数据框包含没有扩展名的图像名称,并且文件夹中的图像也有扩展名。 Eg The image name in DataFrame was 'abc' but the image in the folder was having a name 'abc.png'.例如,DataFrame 中的图像名称为“abc”,但文件夹中的图像名称为“abc.png”。 Just add .png in the image names in DataFrame and it will solve your problem.只需在 DataFrame 中的图像名称中添加 .png 即可解决您的问题。 I just tried below code and it worked out..!!!!我刚刚尝试了下面的代码,它成功了..!!!!

def append_ext(fn):
    return fn+".png"
train_valid_data["id_code"]=train_valid_data["id_code"].apply(append_ext)
test_data["id_code"]=test_data["id_code"].apply(append_ext)

Let me know if it solves your problem or if you need any further explanation.让我知道它是否解决了您的问题或者您是否需要任何进一步的解释。

I have the same problem.我也有同样的问题。 First, make sure you got the absolute path correctly for the parameter directory .首先,确保您获得了正确的参数directory的绝对路径。

The filename in my df has value image.pgm.png and the actual image file in the folder has the format image.pgm .我的 df 中的文件名具有值image.pgm.png ,文件夹中的实际图像文件具有格式image.pgm

  1. I tried to change the filename in df to image.pgm => Still not working我试图将 df 中的文件名更改为image.pgm => 仍然无法正常工作
  2. I renamed the image file from image.pgm to image.pgm.png which matches exactly the format in the df => Worked!我将图像文件从image.pgm重命名为image.pgm.png ,它与 df => 中的格式完全匹配!

I had the same error, What I found is that I missed the directory path, and the image extension that was not in the data frame,我有同样的错误,我发现我错过了目录路径,以及不在数据框中的图像扩展名,

So make sure that your directory path is correct and an extension to your image, as you can do the following:因此,请确保您的目录路径是正确的并且是图像的扩展名,因为您可以执行以下操作:

def extention_train_data(x):
    return x+".jpg"

change the jpg extension if you have an other one.如果您有另一个扩展名,请更改 jpg 扩展名。

then you apply this to you data frame:然后将其应用于您的数据框:

train_data['image'] = train_data['image_id'].apply(extention_train_data)

once you have the image column containing your image with its extension then一旦您拥有包含图像及其扩展名的图像列,然后

train_generator = datagen.flow_from_dataframe(
train_data,  
directory="/kaggle/input/plant-pathology-2020-fgvc7/images/",
x_col = "image",
y_col = "label",
target_size = size,
class_mode = "binary",
batch_size = batch_size,
subset="training",
shuffle = True,
seed = 42,
)

Okay, so I have been having the same issues.好的,所以我一直有同样的问题。 Where my data labels were in a csv file , and the image data in a separate folder.I thought, the issue was being caused by the labels and the images in the folder not aligning properly.Did a whole bunch of stuff to rectify and process the data.我的数据标签位于 csv 文件中,图像数据位于单独的文件夹中。我认为,问题是由文件夹中的标签和图像未正确对齐引起的。做了一大堆东西来纠正和处理数据。 It was not the problem.这不是问题所在。 So, anyone who's having issues.所以,任何有问题的人。 I tried @Oussama Ouardini's answer and it worked.我尝试了@Oussama Ouardini 的回答,它奏效了。 Thank you!谢谢!

I am also going to add - that if you are doing a train and validation split to make sure the initial ImageDataGenerator object you create has the validation split specified.我还要补充一点——如果您正在进行训练和验证拆分,以确保您创建的初始 ImageDataGenerator 对象具有指定的验证拆分。

def extension_train_data(x):
return "xc"+str(x)+".png"


train_df['file_id'] = train_df['file_id'].apply(extension_train_data)

Here is my code -这是我的代码 -

datagen=ImageDataGenerator(rescale=1./255,validation_split=0.2) 


#rescale all pixel values from 0-255, so after this step all our 
  #pixel values are in range (0,1)
  

train_generator=datagen.flow_from_dataframe(dataframe=train_df,directory='./img_data/', x_col="file_id", y_col="english_cname",
                                        class_mode="categorical",save_to_dir='./new folder/',
                                        target_size=(64,64),subset="training",
                                        seed=42,batch_size=32,shuffle=False)





val_generator=datagen.flow_from_dataframe(dataframe=train_df,directory='./img_d 
               ata/', x_col="file_id", y_col="english_cname",
               class_mode="categorical",
               target_size=(64,64),subset="validation",
               seed=42,batch_size=32,shuffle=False)




print("\n Sanity check Line.--------")

My output was a succesfully validated image files.我的输出是一个成功验证的图像文件。 :) :)

Found 212 validated image filenames belonging to 88 classes.
Found 52 validated image filenames belonging to 88 classes.

Sanity check Line.----------

I hope someone will find this useful.我希望有人会发现这很有用。 Cheers!干杯!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM