简体   繁体   English

将 ImageDataGenerator 与回归结合使用 output

[英]Using ImageDataGenerator with regression output

I want to use TensorFlow's ImageDataGenerator.flow_from_directory() to load my dataset but my output is not a classification but a regression.我想使用 TensorFlow 的ImageDataGenerator.flow_from_directory()加载我的数据集,但我的 output 不是分类而是回归。 So I used class_mode=None so no labels are assigned to my data, but now I have to label my training examples and I don't know how (I have my labels as a list).所以我使用class_mode=None所以没有标签分配给我的数据,但现在我必须 label 我的训练示例而且我不知道如何(我有我的标签作为列表)。 Is there a way around this?有没有解决的办法?

Example code:示例代码:

labels = [0.75, 21.60, 10.12] # example labels

# load dataset from directory
image_generator = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255)
train_data = image_generator.flow_from_directory(batch_size=batch_size, directory=train_x_dir, target_size=(224, 224), class_mode=None, shuffle=False)

# assign labels to training examples
# ???

Since I got no direct answer I assume this can't be done in TF 2.3.由于我没有得到直接答案,我认为这不能在 TF 2.3 中完成。

So I referred to a thread mentioned by AerysS, specificaly to answer from user timehaven and used his code to generate batches from pandas dataframe using Keras' load_img and img_to_array .所以我提到了 AerysS 提到的一个线程,专门回答用户 timehaven 并使用他的代码使用 Keras 的load_imgimg_to_array从 pandas dataframe 生成批次。 Code was written for Python 2.7 so I made a few changes to port it and it works for me with Python 3.6.8.代码是为 Python 2.7 编写的,所以我做了一些更改以移植它,它适用于 Python 3.6.8。

data_generator.py数据生成器.py

from __future__ import print_function

from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array

import numpy as np
import pandas as pd
import bcolz
import threading

import os
import sys
import glob
import shutil


bcolz_lock = threading.Lock()
# old_blosc_nthreads = bcolz.blosc_set_nthreads(1)
# assert bcolz.blosc_set_nthreads(1) == 1

def safe_bcolz_open(fname, idx=None, debug=False):
    with bcolz_lock:
        if idx is None:
            X2 = bcolz.open(fname)
        else:
            X2 = bcolz.open(fname)[idx]

        if debug:
            df_debug = pd.DataFrame(X2, index=idx)

            assert X2.shape[0] == len(idx)
            assert X2.shape == df_debug.shape

            df_debug = df_debug.astype(int)

            test_idx = (df_debug.subtract(df_debug.index.values, axis=0) == 0).all(axis=1)
            assert test_idx.all(), df_debug[~test_idx]
    return X2


class threadsafe_iter:
    def __init__(self, it):
        self.it = it
        self.lock = threading.Lock()
        assert self.lock is not bcolz_lock

    def __iter__(self):
        return self

    def next(self):
        with self.lock:
            return self.it.next()

    def __next__(self):
        with self.lock:
            return next(self.it)


def threadsafe_generator(f):
    def g(*a, **kw):
        return threadsafe_iter(f(*a, **kw))
    return g


@threadsafe_generator
def generator_from_df(df, batch_size, target_size, features=None,
                      debug_merged=False):
    if features is not None:
        assert os.path.exists(features)
        assert safe_bcolz_open(features).shape[0] == df.shape[0], "Features rows must match df!"

    nbatches, n_skipped_per_epoch = divmod(df.shape[0], batch_size)

    count = 1
    epoch = 0

    # New epoch.
    while 1:
        df = df.sample(frac=1)  # frac=1 is same as shuffling df.
        epoch += 1
        i, j = 0, batch_size

        # Mini-batches within epoch.
        mini_batches_completed = 0
        for _ in range(nbatches):
            sub = df.iloc[i:j]
            try:
                X = np.array([(2 * (img_to_array(load_img(f, target_size=target_size)) / 255.0 - 0.5)) for f in sub.imgpath])
                Y = sub.target.values
                if features is None:
                    mini_batches_completed += 1
                    yield X, Y
                else:
                    X2 = safe_bcolz_open(features, sub.index.values, debug=debug_merged)
                    mini_batches_completed += 1
                    yield [X, X2], Y
            except IOError as err:
                count -= 1
            i = j
            j += batch_size
            count += 1

train.py火车.py

from data_generator import generator_from_df

def construct_dataframe(img_path, labels_path):
    data = {}
    data['imgpath'] = glob(os.path.join(img_path, '*.png'))
    data['target'] = load_labels(labels_path)
    return pd.DataFrame(data)

train_df = construct_dataframe(train_x_dir, train_y_dir)
train_generator = generator_from_df(train_df, batch_size, (img_size, img_size))

# load and compile model
# ...

model.fit(train_generator, ...)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM