简体   繁体   English

如何加快阵列搜索功能?

[英]How can I speed up my array search function?

I am working on dictionary application written with react-native. 我正在研究用react-native编写的字典应用程序。

When I want to filter the array from the search box, I wrote below function. 当我想从搜索框中过滤数组时,我写了下面的函数。 This is working quite good when I test with 2000 word list. 当我用2000个单词列表进行测试时,这工作得很好。 But when the word list goes to thousands the search speed is really slow. 但是当单词列表达到数千个时,搜索速度确实很慢。

So, how can I improve this search function? 那么,如何改善此搜索功能?

//Filter array when input text (Search)

let filteredWords = []
if(this.state.searchField != null)
{
  filteredWords = this.state.glossaries.filter(glossary => {
    return glossary.word.toLowerCase().includes(this.state.searchField.toLowerCase());
  })
}

As the question doesn't seem to belong on CodeReview, I think there are a few things that you can do to make your code drastically faster [citation needed]: 由于该问题似乎不属于CodeReview,所以我认为可以做一些事情来使您的代码大大加快[需要引用]:

  • Cache that call to this.state.searchField.toLowerCase() as you don't need to call it on every iteration. 缓存对this.state.searchField.toLowerCase()调用,因为您不需要在每次迭代时都调用它。
  • Use regular old for loops instead of flashy-but-slow Array functions. 使用常规的for循环,而不要使用浮华但慢的Array函数。

And here is the final result: 这是最终结果:

let filteredWords = []
if(this.state.searchField != null) {
    let searchField = this.state.searchField.toLowerCase(),
        theArray = this.state.glossaries;                          // cache this too

    for(let i = 0, l = theArray.length; i < l; ++i) {
        if(theArray[i].word.toLowerCase().includes(searchField)) {
            filteredWords.push(theArray[i]);
        }
    }
}

Edit: 编辑:

If you want to search for glossaries whose word start with searchField , then use indexOf === 0 instead of includes as the condition like this: 如果要搜索wordsearchField开头的词汇表,请使用indexOf === 0而不是includes作为这样的条件:

if(theArray[i].word.toLowerCase().indexOf(searchField) === 0) {

There are multiple factors that are making this code slow: 有多种因素导致此代码运行缓慢:

  • You're using filter() with a lambda. 您正在使用带有lambda的filter() This adds a function call overhead for each item being searched. 这会增加要搜索的每个项目的函数调用开销。
  • You're calling toLowercase() on both strings before calling includes() . 您要在调用toLowercase()之前在两个字符串上都调用toLowercase() includes() This will allocate two new string objects for every comparison. 这将为每个比较分配两个新的字符串对象。
  • You're calling includes . 您正在呼叫includes For some reason the includes() method is not as well optimized in some browsers as indexOf() . 由于某些原因, includes()方法在某些浏览器中没有像indexOf()那样优化。

for loop (-11%) for循环(-11%)

Instead of using the filter() method, I recommend creating a new Array and using a for loop to fill it. 建议不要使用filter()方法,而是创建一个新的Array并使用for循环来填充它。

const glossaries = this.state.glossaries;
const searchField = this.state.searchField;
const filteredWords = [];   

for (let i = 0; i < glossaries.length; i++) {
  if (glossaries[i].toLowerCase().includes(searchField.toLowerCase())) {
    filteredWords.push(glossaries[i]);
  }
}

toLowerCase allocations (-45%) toLowerCase分配(-45%)

Memory allocation is expensive due to the fact that JavaScript uses garbage collection mechanism for freeing used memory. 由于JavaScript使用垃圾回收机制释放已用内存,因此内存分配非常昂贵。 When a garbage collection is performed the whole program is paused while it tries to finds memory which is not used anymore. 当执行垃圾回收时,整个程序将暂停,同时尝试查找不再使用的内存。

You can get rid of the toLowerCase() (inside the search loop) completely by making a copy of the glossary everytime the glossary is updated, which I assume is not often. 您可以通过每次更新词汇表时都复制一个词汇表来完全摆脱toLowerCase() (在搜索循环内),我认为这种情况并不常见。

// When you build the glossary
this.state.glossaries = ...;
this.state.searchGlossaries = this.state.glossaries.map(g => g.toLowerCase());

You can also remove the toLowerCase() on the searchText by calling it once before the loop. 您还可以通过在循环之前调用一次,来删除toLowerCase()上的toLowerCase() After these changes, the code will look like: 完成这些更改后,代码将如下所示:

const glossaries = this.state.glossaries;
const searchGlassaries = this.state.searchGlossaries;
const searchField = this.state.searchField.toLowerCase();
const filteredWords = []; 

for (let i = 0; i < glossaries.length; i++) {
  if (searchGlassaries[i].includes(searchField)) {
    filteredWords.push(glossaries[i]);
  }
}

indexOf() instead of includes() (-13%) indexOf()代替includes() (-13%)

I am not really sure why this is the case, but tests show that indexOf is a lot faster than includes . 我不太确定为什么会这样,但是测试表明indexOfincludes快很多。

const glossaries = this.state.glossaries;
const searchGlassaries = this.state.searchGlossaries;
const searchField = this.state.searchField.toLowerCase();
const filteredWords = []; 

for (let i = 0; i < glossaries.length; i++) {
  if (searchGlassaries[i].indexOf(searchField) !== -1) {
    filteredWords.push(glossaries[i]);
  }
}

Overall the performance has improved by 70%. 总体而言,性能提高了70%。 I got the performance percentages from https://jsperf.com/so-question-perf 我从https://jsperf.com/so-question-perf获得了性能百分比

Optimize the algorithm 优化算法

In the comments you said you would like an example of optimizations that can be done when the requirements are loosened to only match words that start with the search text. 在您说的评论中,您想举一个优化示例,该示例可以在放宽要求以仅匹配以搜索文本开头的单词时进行。 One way to do this is a binary search . 一种方法是二进制搜索

Let's take the code from above as starting point. 让我们以上面的代码为起点。 We sort the glossaries before we store it in the state. 我们先对词汇表进行排序,然后再将其存储在状态中。 For sorting case insensitively, JavaScript exposes the Intl.Collator constructor. 为了不区分大小写地排序,JavaScript公开了Intl.Collator构造函数。 It provides the compare(x, y) method that returns: 它提供了compare(x, y)方法,该方法返回:

negative value  | X is less than Y
zero            | X is equal to Y
positive value  | X is greater than Y

And the resulting code: 以及产生的代码:

// Static in the file
const collator = new Intl.Collator(undefined, {
  sensitivity: 'base'
});

function binarySearch(glossaries, searchText) {
  let lo = 0;
  let hi = glossaries.length - 1;

  while (lo <= hi) {
    let mid = (lo + hi) / 2 | 0;
    let comparison = collator.compare(glossaries[mid].word, searchText);

    if (comparison < 0) {
      lo = mid + 1;
    }
    else if (comparison > 0) {
      hi = mid - 1;
    }
    else {
      return mid;
    }
  }

  return -1;
}

// When you build the glossary
this.state.glossaries = ...;
this.state.glossaries.sort(function(x, y) {
  return collator.compare(x.word, y.word);
});

// When you search
const glossaries = this.state.glossaries;
const searchField = this.state.searchField.toLowerCase();
const filteredWords = [];

const idx = binarySearch(glossaries, searchField);

if (idx != -1) {
  // Find the index of the first matching word, seeing as the binary search
  // will end up somewhere in the middle
  while (idx >= 0 && collator.compare(glossaries[idx].word, searchField) < 0) {
    idx--;
  }

  // Add each matching word to the filteredWords
  while (idx < glossaries.length && collator.compare(glossaries[idx].word, searchField) == 0) {
    filteredWords.push(glossaries[idx]);
  }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM