簡體   English   中英

在 keras 中創建混合數據生成器(images,csv)

[英]Create a mixed data generator (images,csv) in keras

我正在構建一個具有多個輸入的模型,如pyimagesearch所示,但是我無法將所有圖像加載到 RAM 中,我正在嘗試創建一個使用flow_from_directory的生成器,並從 CSV 文件中獲取每個正在處理的圖像的所有額外屬性。

問題:如何從 CSV 中獲取與圖像生成器中每批圖像對應的屬性?

def get_combined_generator(images_dir, csv_dir, split, *args):
    """
    Creates train/val generators on images and csv data.

    Arguments:

    images_dir : string
        Path to a directory with subdirectories for each class.

    csv_dir : string
        Path to a directory containing train/val csv files with extra attributes.

    split : string
        Current split being used (train, val or test)
    """
    img_width, img_height, batch_size = args

    datagen = ImageDataGenerator(
        rescale=1. / 255)

    generator = datagen.flow_from_directory(
        f'{images_dir}/{split}',
        target_size=(img_width, img_height),
        batch_size=batch_size,
        shuffle=True,
        class_mode='categorical')

    df = pd.read_csv(f'{csv_dir}/{split}.csv', index_col='image')

    def my_generator(image_gen, data):
        while True:
            i = image_gen.batch_index
            batch = image_gen.batch_size
            row = data[i * batch:(i + 1) * batch]
            images, labels = image_gen.next()
            yield [images, row], labels

    csv_generator = my_generator(generator, df)

    return csv_generator

鑒於這種相對具體的情況,我建議創建一個自定義生成器。 類似以下的內容(從這里的類似答案修改)就足夠了:

import os
import random
import pandas as pd

def generator(image_dir, csv_dir, batch_size):
    i = 0
    image_file_list = os.listdir(image_dir)
    while True:
        batch_x = {'images': list(), 'other_feats': list()}  # use a dict for multiple inputs
        batch_y = list()
        for b in range(batch_size):
            if i == len(image_file_list):
                i = 0
                random.shuffle(image_file_list)
            sample = image_file_list[i]
            image_file_path = sample[0]
            csv_file_path = os.path.join(csv_dir,
                                         os.path.basename(image_file_path).replace('.png', '.csv'))
            i += 1
            image = preprocess_image(cv2.imread(image_file_path))
            csv_file = pd.read_csv(csv_file_path)
            other_feat = preprocess_feats(csv_file)
            batch_x['images'].append(image)
            batch_x['other_feats'].append(other_feat)
            batch_y.append(csv_file.loc[image_name, :]['class'])

        batch_x['images'] = np.array(batch_x['images'])  # convert each list to array
        batch_x['other_feats'] = np.array(batch_x['other_feats'])
        batch_y = np.eye(num_classes)[batch['labels']]
        yield batch_x, batch_y

然后,您可以使用Kerasfit_generator()函數來訓練您的模型。

顯然,這假設您有與圖像文件同名的csv文件,並且您有一些用於圖像和csv文件的自定義preprocessing函數。

我使用自定義生成器根據 Luke 的回答找到了一個解決方案

import random
import pandas as pd
import numpy as np
from glob import glob
from keras.preprocessing import image as krs_image

# Create the arguments for image preprocessing
data_gen_args = dict(
    horizontal_flip=True,
    brightness_range=[0.5, 1.5],
    shear_range=10,
    channel_shift_range=50,
    rescale=1. / 255,
)

# Create an empty data generator
datagen = ImageDataGenerator()

# Read the image list and csv
image_file_list = glob(f'{images_dir}/{split}/**/*.JPG', recursive=True)
df = pd.read_csv(f'{csv_dir}/{split}.csv', index_col=csv_data[0])
random.shuffle(image_file_list)

def custom_generator(images_list, dataframe, batch_size):
    i = 0
    while True:
        batch = {'images': [], 'csv': [], 'labels': []}
        for b in range(batch_size):
            if i == len(images_list):
                i = 0
                random.shuffle(images_list)
            # Read image from list and convert to array
            image_path = images_list[i]
            image_name = os.path.basename(image_path).replace('.JPG', '')
            image = krs_image.load_img(image_path, target_size=(img_height, img_width))
            image = datagen.apply_transform(image, data_gen_args)
            image = krs_image.img_to_array(image)

            # Read data from csv using the name of current image
            csv_row = dataframe.loc[image_name, :]
            label = csv_row['class']
            csv_features = csv_row.drop(labels='class')

            batch['images'].append(image)
            batch['csv'].append(csv_features)
            batch['labels'].append(label)

            i += 1

        batch['images'] = np.array(batch['images'])
        batch['csv'] = np.array(batch['csv'])
        # Convert labels to categorical values
        batch['labels'] = np.eye(num_classes)[batch['labels']]

        yield [batch['images'], batch['csv']], batch['labels']

@Diego Rueda您應該小心。 在您的代碼中,apply_transform不會執行您想要的操作。 根據文檔,此函數不執行隨機轉換,並使用與您使用的參數不同的參數。 但是,當您輸入額外的參數時,它不會引發錯誤,而只會忽略它們,因此,基本上根本不會變換圖像。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM