Tensorflow.js 中的 Memory 泄漏：如何管理使用 tf.data.generator 创建的大型数据集的 memory？

Question

There is a clear memory leak in my code that causes my used memory to go from 5gb to 15.7gb in a span of 40-60 seconds, then crashes my program with an OOM error.我的代码中存在明显的 memory 泄漏，导致我使用的 memory 到 go 在 40-60 秒的时间内从 5gb 到 15.7gb，然后我的程序因 OOM 错误而崩溃。 I believe this happens when I am creating tensors to form the dataset and not when I am training the model. My data consists of 25,000 images stored locally.我相信这发生在我创建张量以形成数据集时，而不是在我训练 model 时。我的数据包含本地存储的 25,000 张图像。 As such, I used the built-in tensorflow.js function tf.data.generator(generator) described here to create the dataset.因此，我使用此处描述的内置 tensorflow.js function tf.data.generator(generator) 创建数据集。 I believe this is the best and most efficient way to create a large dataset as mentioned here .我相信这是创建这里提到的大型数据集的最佳和最有效的方法。

Example例子

I used a helper class to create my dataset by passing in the path to the images我使用助手 class 通过传递图像路径来创建我的数据集

class Dataset{

    constructor(dirPath){
        this.paths = this.#generatePaths(dirPath);
    }

    // Generate file paths for all images to be read as buffer
    #generatePaths = (dirPath) => {
        const dir = fs.readdirSync(dirPath, {withFileTypes: true})
            .filter(dirent => dirent.isDirectory())
            .map(folder => folder.name)
        let imagePaths = [];
        dir.forEach(folder => {
            fs.readdirSync(path.join(dirPath, folder)).filter(file => {
                return path.extname(file).toLocaleLowerCase() === '.jpg'
            }).forEach(file => {
                imagePaths.push(path.resolve(path.join(dirPath, folder, file)))
            })
        })
        return imagePaths;
    }

    // Convert image buffer to a Tensor object
    #generateTensor = (imagePath) => {
        const buffer = fs.readFileSync(imagePath);
        return tf.node.decodeJpeg(buffer, 3)
            .resizeNearestNeighbor([128, 128])
            .toFloat()
            .div(tf.scalar(255.0))
    }

    // Label the data with the corresponding class
    #labelArray(index){return Array.from({length: 2}, (_, k) => k === index ? 1 : 0)};

    // Javascript generator function passed to tf.data.generator()
    * #imageGenerator(){
        for(let i=0; i<this.paths.length; ++i){
            let image;
            try {
                image = this.#generateTensor(this.paths[i]);
            } catch (error) {
                continue;
            }
            console.log(tf.memory());
            yield image;
        }
    }

    // Javascript generator function passed to tf.data.generator()
    * #labelGenerator(){
        for(let i=0; i<this.paths.length; ++i){
            const classIndex = (path.basename(path.dirname(this.paths[i])) === 'Cat' ? 0 : 1);
            const label = tf.tensor1d(this.#labelArray(classIndex), 'int32')
            console.log(tf.memory());
            yield label;
        }
    }

    // Load data
    loadData = () => {
        console.log('\n\nLoading data...')
        const xs = tf.data.generator(this.#imageGenerator.bind(this));
        const ys = tf.data.generator(this.#labelGenerator.bind(this));
        const ds = tf.data.zip({xs, ys}).batch(32).shuffle(32);
        return ds;
    }
}

And I am creating my dataset like this:我正在这样创建我的数据集：

const trainDS = new dataset(trainPath).loadData();

Question问题

I am aware of built-in tfjs methods to manage memory such as tf.tidy() and tf.dispose().我知道用于管理 memory 的内置 tfjs 方法，例如 tf.tidy() 和 tf.dispose()。 However, I was unable to implement them in such a way to stop the memory leak, as the tensors are generated by the tf.data.generator function.但是，我无法以阻止 memory 泄漏的方式实施它们，因为张量是由 tf.data.generator function 生成的。

How would I go about successfully disposing the tensors from memory after they are yielded by the generators?我 go 如何在生成器产生张量后成功处理来自 memory 的张量？

Answer 1

Every tensor you create, you need to dispose of - there is no garbage collection as you're used to in JS.你创建的每个张量，你都需要处理 - 没有你在 JS 中习惯的垃圾收集。 That's because tensors are not kept in JS memory (they can be in GPU memory or WASM module, etc.), so JS engine cannot track them.那是因为张量没有保存在 JS memory 中（它们可以在 GPU memory 或 WASM 模块等），所以 JS 引擎无法跟踪它们。 They are more like pointers than normal variables.它们更像是指针而不是普通变量。

For example, in your code:例如，在您的代码中：

        return tf.node.decodeJpeg(buffer, 3)
            .resizeNearestNeighbor([128, 128])
            .toFloat()
            .div(tf.scalar(255.0))

each chained operation creates interim tensor that never gets disposed每个链式操作都会创建永远不会被处置的临时张量
read it this way:这样读：

const decoded = tf.node.decodeJpeg(buffer, 3)
const resized = decoded.resizeNearestNeighbor([128, 128])
const casted = resized.toFloat();
const normalized = casted.div(tf.scalar(255.0))
return normalized;

so you have 4 large tensors allocated somewhere所以你在某处分配了 4 个大张量
what you're missing is你缺少的是

tf.dispose([decoded, resized, casted]);

and later when youre done with the image, also tf.dispose(image) which disposes normalized稍后当您完成图像处理时，还有tf.dispose(image)处理normalized

and same regarding everything that is a tensor.对于张量的一切也是如此。

I am aware of built-in tfjs methods to manage memory such as tf.tidy() and tf.dispose().我知道用于管理 memory 的内置 tfjs 方法，例如 tf.tidy() 和 tf.dispose()。 However, I was unable to implement them in such a way to stop the memory leak, as the tensors are generated by the tf.data.generator function.但是，我无法以阻止 memory 泄漏的方式实施它们，因为张量是由 tf.data.generator function 生成的。

you say you're aware, but you're doing the exactly the same thing by creating interim tensors you're not disposing.你说你知道，但你通过创建你没有处理的临时张量来做完全相同的事情。

you can help yourself by wrapping such functions in a tf.tidy() that creates a local scope so everything that is not returned gets automatically released.您可以通过将此类函数包装在创建本地 scope 的tf.tidy()中来帮助自己，以便自动释放未返回的所有内容。

for example:例如：

   #generateTensor = tf.tidy(imagePath) => {
        const buffer = fs.readFileSync(imagePath);
        return tf.node.decodeJpeg(buffer, 3)
            .resizeNearestNeighbor([128, 128])
            .toFloat()
            .div(tf.scalar(255.0))
    }

which means interim tensors will get disposed of, but you still need to dispose the return value once youre done with it这意味着临时张量将被处理掉，但您仍然需要在完成后处理返回值

Tensorflow.js 中的 Memory 泄漏：如何管理使用 tf.data.generator 创建的大型数据集的 memory？

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-10-07 11:45:45

Tensorflow.js 中的 Memory 泄漏：如何管理使用 tf.data.generator 创建的大型数据集的 memory？

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-10-07 11:45:45

解决方案1
2 已采纳 2021-10-07 11:45:45