Universal Sentence Encoder tensorflowjs optimize performance using webworker

Question

I am using the following code to initiate Webworker which creates embeddings using Universal Sentence Encoder

const initEmbeddingWorker = (filePath) => {
    let worker = new Worker(filePath);
    worker.postMessage({init: 'init'})

    worker.onmessage = (e) => {
        worker.terminate();
    }
}

Webworker code

onmessage = function (e) {
    if(e.data.init && e.data.init === 'init') {
        fetchData();
    }
}

const fetchData = () => {
    //fetches data from indexeddb
    createEmbedding(data, storeEmbedding);
}

const createEmbedding = (data, callback) => {
    use.load().then(model => {
        model.embed(data).then(embeddings => {
            callback(embeddings);
        })
    });
}

const storeEmbedding = (matrix) => {
    let data = matrix.arraySync();
    //store data in indexeddb
}

It takes 3 minutes to create 100 embeddings using 10 Webworkers running simultaneously and each worker creating embeddings for 10 sentences. The time taken to create embeddings is too large as I need to create embedding for more than 1000 sentences which takes around 25 to 30 minutes. Whenever this code runs it hogs all the resources which makes the machine very slow and almost unusable.

Are there any performance optimizations that are missing?

Answer 1

Using 10 webworkers means that the machine used to run it has at least 11 cores. Why this assumption? (number of webworker + main thread )

To leverage the use of webworker to the best, each webworker should be run on a different core. What happens when there are more workers than cores? Well the program won't be as fast as expected because a lot of times will be used exchanging communications between the cores.

Now let's look at what happens on each core.

arraySync is a blocking call preventing that thread from be using for another thing.

Instead of using arraySync , array can be used.

const storeEmbedding = async (matrix) => {
    let data = await matrix.array();
    //store data in indexeddb
}

array and its counterpart arraySync are slower compare to data and dataSync . It will be better to store the flatten data, output of data .

const storeEmbedding = async (matrix) => {
        let data = await matrix.data();
        //store data in indexeddb
    }

Universal Sentence Encoder tensorflowjs optimize performance using webworker

Question

1 answers

solution1
0 2021-02-10 21:01:37

Universal Sentence Encoder tensorflowjs optimize performance using webworker

Question

1 answers

solution1 0 2021-02-10 21:01:37

solution1
0 2021-02-10 21:01:37