简体   繁体   English

我应该如何表示有效搜索和比较字符串的数据

[英]How should I represent data for effective searching and comparing strings

I have two array with length 300. They look like this (JSON representation): 我有两个长度为300的数组。它们看起来像这样(JSON表示):

[
    [
        ["word1",0.000199],
        ["word2",0.000102],
          ...
        ["word15",0.000102]
    ],
      ...
    [
        ["anotherword1",0.0032199],
        ["anotherword2",0.032302],
          ...
        ["anotherword15",0.0320102]
    ]
]

And I have this bruteforce algorithm: 我有这个强力算法:

for(var i = 0; i < 300; i++)
    {
        for(var j = 0; j < 15; j++)
        {
            for(var ii = i + 1; ii < 300; ii++)
            {
                for(var jj = 0; jj < 15; jj++)
                {
                    for(var jjj = 0; jjj < 15; jjj++)
                    {
                        if(new_keywords[i][j][0] === new_keywords[ii][jj][0] && new_keywords[ii][jj][0] === state_keywords[i][jjj][0])
                        {
                            console.log(0);
                        }
                    }
                }
            }
        }
}

I need to search for same words in those arrays and if words are the same, then I sum values and divide sum by 3 and replace that value in state_keywords array. 我需要在这些数组中搜索相同的单词,如果单词是相同的,那么我将值加总并除以3并在state_keywords数组中替换该值。 So for each word which is more then once in array I have means of its values. 因此,对于每个在数组中超过一次的单词,我都有其值的含义。

Now... my approach is very bad because I have now about 300 mil iterations and that is crazy. 现在......我的方法非常糟糕,因为我现在有大约300万次迭代,这很疯狂。 I need some better implementation of my array in JavaScript. 我需要在JavaScript中更好地实现我的数组。 Something like lexikographical tree or kd-tree or something. 像lexikographical树或kd树或其他东西。

Thank you. 谢谢。

EDIT: 编辑:

Here is http://jsfiddle.net/dD7yB/1/ with example. 这是http://jsfiddle.net/dD7yB/1/的例子。

EDIT2: EDIT2:

I'm sorry if I'm not clear enough. 如果我不够清楚,我很抱歉。 So what exaclty I'm doing: 那么我正在做什么exaclty:

  • I have array state_keywords . 我有数组state_keywords Indexes are from 0 to 299 and they representing a themes ... 索引从0到299,它们代表themes ......
  • Each theme may be represented by 15 words and every time new_keywords array arrives, they may be different. 每个主题可以由15个单词表示,并且每次new_keywords数组到达时,它们可以是不同的。
  • When new_keywords array arrive I need to check every word in that array if it is in state_keywords array on same theme index. 当new_keywords数组到达时,我需要检查该数组中的每个单词是否在同一主题索引上的state_keywords数组中。
  • If it is: add probabilities up and divide by 2. 如果是:增加概率并除以2。
  • If it is not: add new word into state_keyword array BUT if they are more than 15 words for one theme (which now are) I need to store just first 15 sorted by probabilities. 如果不是:将新单词添加到state_keyword数组中但是如果它们对于一个主题(现在是)超过15个单词,我需要存储前15个按概率排序。

And this I need to do as effectively as possbile, because I need to do this every second so it must be FAST. 而且我需要尽可能有效地做,因为我需要每秒都这样做,因此它必须是快速的。

EDIT3: EDIT3:

Now I use this code: 现在我使用这段代码:

var i, j, jj, l;
for(i = 0; i < 300; i++)
{
    for(j = 0; j < 15; j++)
    {
        l = new_keywords[i].length;
        for(jj = 0; jj < l; jj++)
        {
            if(state_keywords[i][j][0] === new_keywords[i][jj][0])
            {  
                state_keywords[i][j][1] = (state_keywords[i][j][1] + new_keywords[i][jj][1]) / 2;
            }
        }
    }
}

which is much faster then the previous one. 这比前一个要快得多。

Why don't you make those arrays into objects with the strings as keys to the values? 为什么不将这些数组作为值作为键的键的对象? Then you can just just look up the words directly and get the values? 然后你可以直接查找单词并获取值?

var wordlists = [
    {
        "word1":0.000199,
        "word2":0.000102,
          ...
        "word15":0.000102
    },
      ...
    {
        "anotherword1":0.0032199,
        "anotherword2":0.032302,
          ...
        "anotherword15":0.0320102
    }
]

and then lookup with 然后查找

wordlists[0]["word2"]  //0.000102

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM