简体   繁体   English

[Node.js, MongoDB/Mongoose]:使用 indexOf 生成唯一的标签列表非常慢

[英][Node.js, MongoDB/Mongoose]: Generating a unique list of tags using indexOf is very slow

I have a two models, a video model and a global statistic model.我有两个模型,一个视频 model 和一个全局统计 model。 The video model stores an array of strings for tags.视频 model 存储标签字符串数组。 The global statistics model stores an array of tagCountSchema that contains tag and count .全局统计信息 model 存储一个tagCountSchema数组,其中包含tagcount

I am writing a function that deletes and rebuilds the global statistic documents using data from the video documents.我正在编写一个 function,它使用视频文档中的数据删除和重建全局统计文档。 This includes rebuilding the list of unique tags and their counts in the global statistics document.这包括在全局统计文档中重建唯一标签列表及其计数。

const videoSchema = new mongoose.Schema({
    tags: [{ type: String }],
});

const tagCountSchema = new mongoose.Schema({
    tag: { type: String, required: true },
    count: { type: Number, default: 1 },
}, { _id: false });

const statisticSchema = new mongoose.Schema({
    is: { type: String, default: 'global' },
    tags: [tagCountSchema],
});

const Statistic = mongoose.model('Statistic', statisticSchema );
const Video = mongoose.model('Video', videoSchema );
// Rebuild the statistics document
let statistics = await Statistic.findOne({ is: 'global' });
let videos = await Video.find({});

let map = statistics.tags.map(e => e.tag);

for (let video of videos) {    
    for (let tag of video.tags) {
        const index = map.indexOf(tag);
        if (index === -1) {
            statistics.tags.push({ tag: tag, count: 1 });
            map.push(tag);
        } else {
            statistics.tags[index].count++;
        }
    }
}

await statistics.save();

However, the use of indexOf() in the function above makes rebuilding the statistics take a very long time.但是,在上面的 function 中使用indexOf()使得重建统计信息需要长时间。 Since videos have a lot of unique tags, the array of unique tags on the global statistics document becomes really long and since indexOf() needs to be called for each tag of each video the function takes a long time to complete.由于视频有很多唯一标签,全局统计文档上的唯一标签数组变得非常长,并且由于需要为每个视频的每个标签调用indexOf() ,因此 function 需要很长时间才能完成。

I tested a version of this function that stored tags as an Object in the database and used Object.keys to update tags in the statistics document.我测试了这个 function 的一个版本,它在数据库中将标签存储为 Object,并使用Object.keys更新统计文档中的标签。 This was an order of magnitude faster but I have come to realize that storing tag names directly as an object in the database would cause issues if the tag name was illegal to use as a database key.这速度快了一个数量级,但我已经意识到,如果将标签名称作为数据库键非法使用,将标签名称直接存储为数据库中的 object 会导致问题。

It is also technically possible I could stringify the tags object to store it, but that is not convent for how this function is used in other places of my code.从技术上讲,我也可以对标签 object 进行字符串化来存储它,但这对于如何在我的代码的其他地方使用这个 function 来说并不传统。 As the function loops through videos it is also updating similar statistics for other documents (such as uploader) which I have left out of the code for simplicities sake.由于 function 循环播放视频,它还更新了其他文档(例如上传器)的类似统计信息,为了简单起见,我在代码中省略了这些统计信息。 This would mean it would need to stringify and destringify the object for every video.这意味着它需要对每个视频的 object 进行字符串化和去字符串化。

What can I improve the speed of this function?我可以提高这个 function 的速度吗?

Maybe your aproach is not quite good.也许你的方法不太好。 It would be simplier if you update your statitics as you register your videos.如果您在注册视频时更新统计信息,会更简单。

This way you will avoid the building index problem.这样,您将避免构建索引问题。 Or you can use a queue to update your data.或者您可以使用队列来更新您的数据。

This is way i would do这是我会做的

I have come across a similar problem in my line of work and had to get a little creative to avoid that indexOf (in my case find ) function, because, as you already know, it's a hugely expensive operation.我在我的工作中遇到了类似的问题,并且不得不有点创意以避免indexOf (在我的情况下是find )function,因为正如你已经知道的那样,这是一项非常昂贵的操作。

On the other hand, as you may know, looking up keys of an object is pretty much instant.另一方面,您可能知道,查找 object 的密钥几乎是即时的。 So I would rewrite the code that builds the statistics document like so:所以我会重写构建统计文档的代码,如下所示:

const map = {};
statistics.tags.map((e, i) => {
    map[e.tag] = i
});

for (let video of videos) {
    for (let tag of video.tags) {
        if (tag in map) {
            statistics.tags[map[tag]].count++;
        } else {
            statistics.tags.push({ tag, count: 1 });
            map[tag] = Object.keys(map).length;
        }
    }
}

This will significantly speed up your nested loop.这将显着加快您的嵌套循环。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM