简体   繁体   English

分层训练测试拆分 Tensorflow 数据集

[英]Stratified train-test splitting a Tensorflow dataset

I am currently working with a quite large image-dataset and I loaded it using ImageDataGenerator from tensorflow.keras in python.我目前正在处理一个相当大的图像数据集,并使用 python 中tensorflow.kerasImageDataGenerator加载它。 As the classification of my data is very imbalanced I wanted to do a stratified train-test-split to possibly achieve a higher accuracy.由于我的数据分类非常不平衡,我想做一个分层的训练-测试-拆分,以可能达到更高的准确度。

I know how to do a simple random train-test-split using ImageDataGenerator but I couldn't find any equivalent of the stratified train_test_split you can do in sklearn .我知道如何使用ImageDataGenerator进行简单的随机训练测试拆分,但我找不到任何等效的分层 train_test_split 可以在sklearn中执行。

Is there any way to stratified train-test-split a tensorflow.data.Dataset ?有什么方法可以分层训练测试拆分tensorflow.data.Dataset吗? And if not, how do you deal with large imbalanced datasets?如果没有,您如何处理大型不平衡数据集? I would very appreciate your help!非常感谢您的帮助!

Here is the relevant code:以下是相关代码:

from tensorflow.keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator()
dataset = datagen.flow_from_directory(
    path_images, 
    target_size=(ImageHeight, ImageWidth), 
    color_mode='rgb', 
    class_mode='sparse', 
    batch_size=BatchSize, 
    shuffle=True, 
    seed=Seed,
)

流(x,y=None,batch_size=32,shuffle=True,sample_weight=None,seed=None,save_to_dir=None,save_prefix='',save_format='png',ignore_class_split=False,subset=None)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用两个分层类的自定义训练测试拆分 - Custom train-test split using two stratified classes Sklearn中的训练/测试/验证分层集拆分 - Train/Test/Validation stratified Set Splitting in Sklearn Python:训练测试拆分数据帧时出现类型错误 - Python: TypeError while Train-Test splitting of data-frame 基于python中的多个特征的训练-测试分割的分层交叉验证或抽样 - Stratified Cross Validation or Sampling for train-test split based on multiple features in python TensorFlow 数据集训练/测试拆分 - TensorFlow Dataset train/test split 为什么每次我在这个特定的数据集上运行训练测试拆分时,我的 kernel 都会死掉? - Why does my kernel die every time I run train-test split on this particular dataset? 深度学习-将图像数据集分为训练和测试 - Deep learning - splitting the image dataset into train and test 如何准备图像数据集以训练和测试张量流 - How to prepare a dataset of images to train and test tensorflow Pandas 根据目标变量its cluster分层拆分成train, test, validation set - Pandas stratified splitting into train, test, and validation set based on the target variable its cluster 将数据集拆分为训练、验证和测试的正确方法是什么? - what is the correct way of splitting dataset into train, validation and test?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM