简体   繁体   中英

[Node.js, MongoDB/Mongoose]: Generating a unique list of tags using indexOf is very slow

I have a two models, a video model and a global statistic model. The video model stores an array of strings for tags. The global statistics model stores an array of tagCountSchema that contains tag and count .

I am writing a function that deletes and rebuilds the global statistic documents using data from the video documents. This includes rebuilding the list of unique tags and their counts in the global statistics document.

const videoSchema = new mongoose.Schema({
    tags: [{ type: String }],
});

const tagCountSchema = new mongoose.Schema({
    tag: { type: String, required: true },
    count: { type: Number, default: 1 },
}, { _id: false });

const statisticSchema = new mongoose.Schema({
    is: { type: String, default: 'global' },
    tags: [tagCountSchema],
});

const Statistic = mongoose.model('Statistic', statisticSchema );
const Video = mongoose.model('Video', videoSchema );
// Rebuild the statistics document
let statistics = await Statistic.findOne({ is: 'global' });
let videos = await Video.find({});

let map = statistics.tags.map(e => e.tag);

for (let video of videos) {    
    for (let tag of video.tags) {
        const index = map.indexOf(tag);
        if (index === -1) {
            statistics.tags.push({ tag: tag, count: 1 });
            map.push(tag);
        } else {
            statistics.tags[index].count++;
        }
    }
}

await statistics.save();

However, the use of indexOf() in the function above makes rebuilding the statistics take a very long time. Since videos have a lot of unique tags, the array of unique tags on the global statistics document becomes really long and since indexOf() needs to be called for each tag of each video the function takes a long time to complete.

I tested a version of this function that stored tags as an Object in the database and used Object.keys to update tags in the statistics document. This was an order of magnitude faster but I have come to realize that storing tag names directly as an object in the database would cause issues if the tag name was illegal to use as a database key.

It is also technically possible I could stringify the tags object to store it, but that is not convent for how this function is used in other places of my code. As the function loops through videos it is also updating similar statistics for other documents (such as uploader) which I have left out of the code for simplicities sake. This would mean it would need to stringify and destringify the object for every video.

What can I improve the speed of this function?

Maybe your aproach is not quite good. It would be simplier if you update your statitics as you register your videos.

This way you will avoid the building index problem. Or you can use a queue to update your data.

This is way i would do

I have come across a similar problem in my line of work and had to get a little creative to avoid that indexOf (in my case find ) function, because, as you already know, it's a hugely expensive operation.

On the other hand, as you may know, looking up keys of an object is pretty much instant. So I would rewrite the code that builds the statistics document like so:

const map = {};
statistics.tags.map((e, i) => {
    map[e.tag] = i
});

for (let video of videos) {
    for (let tag of video.tags) {
        if (tag in map) {
            statistics.tags[map[tag]].count++;
        } else {
            statistics.tags.push({ tag, count: 1 });
            map[tag] = Object.keys(map).length;
        }
    }
}

This will significantly speed up your nested loop.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM