简体   繁体   English

Tensorflow.js 中的 Memory 泄漏:如何管理使用 tf.data.generator 创建的大型数据集的 memory?

[英]Memory leak in Tensorflow.js: How to manage memory for a large dataset created using tf.data.generator?

There is a clear memory leak in my code that causes my used memory to go from 5gb to 15.7gb in a span of 40-60 seconds, then crashes my program with an OOM error.我的代码中存在明显的 memory 泄漏,导致我使用的 memory 到 go 在 40-60 秒的时间内从 5gb 到 15.7gb,然后我的程序因 OOM 错误而崩溃。 I believe this happens when I am creating tensors to form the dataset and not when I am training the model. My data consists of 25,000 images stored locally.我相信这发生在我创建张量以形成数据集时,而不是在我训练 model 时。我的数据包含本地存储的 25,000 张图像。 As such, I used the built-in tensorflow.js function tf.data.generator(generator) described here to create the dataset.因此,我使用此处描述的内置 tensorflow.js function tf.data.generator(generator) 创建数据集。 I believe this is the best and most efficient way to create a large dataset as mentioned here .我相信这是创建这里提到的大型数据集的最佳和最有效的方法。

Example例子

I used a helper class to create my dataset by passing in the path to the images我使用助手 class 通过传递图像路径来创建我的数据集

class Dataset{

    constructor(dirPath){
        this.paths = this.#generatePaths(dirPath);
    }

    // Generate file paths for all images to be read as buffer
    #generatePaths = (dirPath) => {
        const dir = fs.readdirSync(dirPath, {withFileTypes: true})
            .filter(dirent => dirent.isDirectory())
            .map(folder => folder.name)
        let imagePaths = [];
        dir.forEach(folder => {
            fs.readdirSync(path.join(dirPath, folder)).filter(file => {
                return path.extname(file).toLocaleLowerCase() === '.jpg'
            }).forEach(file => {
                imagePaths.push(path.resolve(path.join(dirPath, folder, file)))
            })
        })
        return imagePaths;
    }

    // Convert image buffer to a Tensor object
    #generateTensor = (imagePath) => {
        const buffer = fs.readFileSync(imagePath);
        return tf.node.decodeJpeg(buffer, 3)
            .resizeNearestNeighbor([128, 128])
            .toFloat()
            .div(tf.scalar(255.0))
    }

    // Label the data with the corresponding class
    #labelArray(index){return Array.from({length: 2}, (_, k) => k === index ? 1 : 0)};

    // Javascript generator function passed to tf.data.generator()
    * #imageGenerator(){
        for(let i=0; i<this.paths.length; ++i){
            let image;
            try {
                image = this.#generateTensor(this.paths[i]);
            } catch (error) {
                continue;
            }
            console.log(tf.memory());
            yield image;
        }
    }

    // Javascript generator function passed to tf.data.generator()
    * #labelGenerator(){
        for(let i=0; i<this.paths.length; ++i){
            const classIndex = (path.basename(path.dirname(this.paths[i])) === 'Cat' ? 0 : 1);
            const label = tf.tensor1d(this.#labelArray(classIndex), 'int32')
            console.log(tf.memory());
            yield label;
        }
    }

    // Load data
    loadData = () => {
        console.log('\n\nLoading data...')
        const xs = tf.data.generator(this.#imageGenerator.bind(this));
        const ys = tf.data.generator(this.#labelGenerator.bind(this));
        const ds = tf.data.zip({xs, ys}).batch(32).shuffle(32);
        return ds;
    }
}

And I am creating my dataset like this:我正在这样创建我的数据集:

const trainDS = new dataset(trainPath).loadData();

Question问题

I am aware of built-in tfjs methods to manage memory such as tf.tidy() and tf.dispose().我知道用于管理 memory 的内置 tfjs 方法,例如 tf.tidy() 和 tf.dispose()。 However, I was unable to implement them in such a way to stop the memory leak, as the tensors are generated by the tf.data.generator function.但是,我无法以阻止 memory 泄漏的方式实施它们,因为张量是由 tf.data.generator function 生成的。

How would I go about successfully disposing the tensors from memory after they are yielded by the generators?我 go 如何在生成器产生张量后成功处理来自 memory 的张量?

Every tensor you create, you need to dispose of - there is no garbage collection as you're used to in JS.你创建的每个张量,你都需要处理 - 没有你在 JS 中习惯的垃圾收集。 That's because tensors are not kept in JS memory (they can be in GPU memory or WASM module, etc.), so JS engine cannot track them.那是因为张量没有保存在 JS memory 中(它们可以在 GPU memory 或 WASM 模块等),所以 JS 引擎无法跟踪它们。 They are more like pointers than normal variables.它们更像是指针而不是普通变量。

For example, in your code:例如,在您的代码中:

        return tf.node.decodeJpeg(buffer, 3)
            .resizeNearestNeighbor([128, 128])
            .toFloat()
            .div(tf.scalar(255.0))

each chained operation creates interim tensor that never gets disposed每个链式操作都会创建永远不会被处置的临时张量
read it this way:这样读:

const decoded = tf.node.decodeJpeg(buffer, 3)
const resized = decoded.resizeNearestNeighbor([128, 128])
const casted = resized.toFloat();
const normalized = casted.div(tf.scalar(255.0))
return normalized;

so you have 4 large tensors allocated somewhere所以你在某处分配了 4 个大张量
what you're missing is你缺少的是

tf.dispose([decoded, resized, casted]);

and later when youre done with the image, also tf.dispose(image) which disposes normalized稍后当您完成图像处理时,还有tf.dispose(image)处理normalized

and same regarding everything that is a tensor.对于张量的一切也是如此。

I am aware of built-in tfjs methods to manage memory such as tf.tidy() and tf.dispose().我知道用于管理 memory 的内置 tfjs 方法,例如 tf.tidy() 和 tf.dispose()。 However, I was unable to implement them in such a way to stop the memory leak, as the tensors are generated by the tf.data.generator function.但是,我无法以阻止 memory 泄漏的方式实施它们,因为张量是由 tf.data.generator function 生成的。

you say you're aware, but you're doing the exactly the same thing by creating interim tensors you're not disposing.你说你知道,但你通过创建你没有处理的临时张量来做完全相同的事情。

you can help yourself by wrapping such functions in a tf.tidy() that creates a local scope so everything that is not returned gets automatically released.您可以通过将此类函数包装在创建本地 scope 的tf.tidy()中来帮助自己,以便自动释放未返回的所有内容。

for example:例如:

   #generateTensor = tf.tidy(imagePath) => {
        const buffer = fs.readFileSync(imagePath);
        return tf.node.decodeJpeg(buffer, 3)
            .resizeNearestNeighbor([128, 128])
            .toFloat()
            .div(tf.scalar(255.0))
    }

which means interim tensors will get disposed of, but you still need to dispose the return value once youre done with it这意味着临时张量将被处理掉,但您仍然需要在完成后处理返回值

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从 tensorflow.js 中 tf.data.generator 使用的内存释放内存? - How to release memory from one that used by tf.data.generator in tensorflow.js? 向 tensorflow.js 中的 `tf.data.generator` 添加输入 - Adding inputs to a `tf.data.generator` in tensorflow.js 由于生成器不能使用回调,如何将 TensorFlow.js tf.data.generator 用于远程数据源 - How can one use TensorFlow.js tf.data.generator for remote data sources since generators can't use callbacks 使用Tensorflow.js的NewGeneration方法出现内存泄漏问题 - Memory Leak Issue with my NewGeneration method using Tensorflow.js Tensorflow.js 混淆 memory 泄漏问题? - Tensorflow.js confusing memory leak problem? Tensorflow.js 中的内存泄漏:如何清理未使用的张量? - Memory leak in Tensorflow.js: How to clean up unused tensors? 关于TensorFlow.js中tf.Model的内存管理 - Memory Management about tf.Model in TensorFlow.js 如何将大型数据集拆分为 2 以使用 Tensorflow.js 进行验证? - How do I split a large Dataset in 2 for validation with Tensorflow.js? 使用Tensorflow.js和tf.Tensor处理大数据的最佳方法是什么? - What is the best way to handle large data with Tensorflow.js and tf.Tensor? 在 Tensorflow.js 的 model.evaluate 方法中使用来自 tf.data.csv 的数据的问题 - Problem at using the data from tf.data.csv in model.evaluate method at Tensorflow.js
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM