简体   繁体   English

MongoDB:如何实现用于检查文本的查找字典

[英]MongoDB: How to realize an lookup dictionary for checking text

I would like to realize an dictionary to check for correct spelling of some text. 我想实现一个字典来检查某些文本的正确拼写。 In this dictionary there are 20.000 word. 在这本词典中有20.000个单词。 My application (which is a meteor application) will first load the text. 我的应用程序(这是一个流星应用程序)将首先加载文本。 Now I would split this text into words and check if each of them is in the dictionary. 现在,我将把该文本拆分为单词,然后检查每个单词是否在词典中。

But is this technically the best way? 但这在技术上是最好的方法吗? A text with 100 words, would have 100 DB calls, which feels not good. 包含100个单词的文本将有100个DB调用,感觉不好。 But also it doesn't make sense for me to load 20.000 word completly in an array to make a lookup... 但是对我来说,在数组中完全加载20.000个单词以进行查找也没有意义...

let incorrect = [];
text.split(' ').forEach(word => {
    if (!Dictionary.findOne({ word: word })) {
        incorrect.push(word);
    }
})

if (incorrect.length)
    console.log('There is a spelling mistake');
else
    console.log('Everything seems to be correct');

Another way I was thinking of is to send the array with the splitted words in a query and geting all missing elements as an result (array). 我想到的另一种方法是在查询中发送带有拆分单词的数组,并获取所有缺少的元素作为结果(数组)。 But I don't know if this can be done by mongoDB. 但是我不知道mongoDB是否可以做到这一点。

You would find all the words in the text which are in the database. 您会在数据库中找到文本中的所有单词。 So if the text contains 100 words, there should be 100 documents respectively, if not that means there is something wrong with the text: 因此,如果文本包含100个单词,则应该分别有100个文档,如果不是,则意味着文本有问题:

const arr = text.split(' ');
const wordCount = arr.length;

const docCount = Dictionary.find({
  word: {
    $in: arr,
  },
}).count();

if (wordCount !== docCount) {
  console.log('There is a spelling mistake');
}

Update 更新

If you need to get the misspelled words, you would simply use a diff function on the arr input and the result words found in db. 如果需要获取拼写错误的单词,则只需对arr输入使用diff函数,并在db中找到结果单词。 I suppose you have underscore installed, I use _.difference to get the result: 我想您已经安装underscore ,我使用_.difference获得结果:

const arr = text.split(' ');

const foundWord = Dictionary.find({
  word: {
    $in: arr,
  },
}).map(obj => obj.word);

const misspelledWords = _.difference(arr, foundWord);

console.log(misspelledWords);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM