简体   繁体   English

查找单词何时在数组中彼此相邻出现

[英]Find when words appear next to each other in an array

I have a huge array of strings (words) that I'm analyzing for patterns. 我正在分析各种模式的字符串(单词)。

I want to create a function to: 我想创建一个函数来:

  • Identify when words are appearing next to each other in a specific order more than once 识别单词以特定顺序不止一次出现的时间
  • For each instance that the words appear in order, combine them into a single array element. 对于单词按顺序出现的每个实例,将它们组合成单个数组元素。

Example

Given the following array 给定以下数组

let array = ["john", "smith", "says", "that", "a", "lock", "smith", "can", "open", "the", "lock", "unlike", "john", "smith"]

Desired Result: 所需结果:

["john smith", "says", "that", "a", "lock", "smith", "can", "open", "the", "lock", "unlike", "john smith"]

Ideally the function identifies more than just 2-word combinations (ie identifies when the combination of "white", "house", "press", "secretary" are appearing more than once. 理想情况下,该功能不仅可以识别2个单词的组合(即,识别“白色”,“房屋”,“新闻”,“秘书”的组合何时出现不止一次)。

I'm really struggling with the logic to have much to show. 我真的很难在逻辑上展现很多东西。 I've also been looking for a solution in a library like underscore.js without luck. 我也一直在没有运气的情况下在underscore.js之类的库中寻找解决方案。

Build a "dictionary" of a all words and their immediate successor. 建立所有单词及其直接后继者的“词典”。 Then loop through the original array and for each element, check if all dictionary returns match, and if so, combine the words and skip the immediate successor. 然后循环遍历原始数组,并为每个元素检查所有字典是否返回匹配项,如果匹配,则组合单词并跳过直接后继。

 var arr = ["john", "smith", "says", "that", "a", "lock", "smith", "can", "open", "the", "lock", "unlike", "john", "smith"]; function combineCommon(arr) { var dictionary = {}; for (var a = 0; a < arr.length - 1; a++) { var A = arr[a]; if (dictionary[A] == void 0) { dictionary[A] = []; } dictionary[A].push(arr[a + 1]); } var res = []; for (var index = 0; index < arr.length; index++) { var element = arr[index]; var pass = false; if (dictionary[element].length > 1) { if (dictionary[element] .some(function(a) { return a != dictionary[element][0]; }) == false) { pass = true; } } if (pass) { res.push(arr[index] + " " + dictionary[element][0]); index++; } else { res.push(arr[index]); } } return res; } console.log(combineCommon(arr)); 

You could count the pairs and check for pairs when reassembling the result. 重新组合结果时,您可以计算对并检查对。

 var array = ["john", "smith", "says", "that", "a", "lock", "foo", "bar", "baz", "smith", "can", "open", "foo", "bar", "baz", "the", "lock", "unlike", "john", "smith"], count = Object.create(null), result; array.forEach(function (a, i, aa) { var key = aa.slice(i, i + 2).join(' '); count[key] = (count[key] || 0) + 1; }); result = array.reduce(function (r, a, i, aa) { var key = aa.slice(i, i + 2).join(' '); if (count[key] > 1) { a = key; } else if (count[aa.slice(i - 1, i + 1).join(' ')] > 1) { a = []; } return r.concat(a); }, []); console.log(result); 
 .as-console-wrapper { max-height: 100% !important; top: 0; } 

Please check this. 请检查一下。

 var data = ["john", "smith", "says", "that", "a", "lock", "smith", "can", "open", "the", "lock", "unlike", "john", "smith"] var result= []; var flag=0; var n=data.length; var k=0; // Outer main for loop. for(var i=0;i<n;i++){ // Get next word. next_word = data[i+1]; flag=0; // Inner for loop. for(var j=0;j<n;j++){ // john == john && smith == smith // smith == john && smith == smith // .. // .. if(data[j]==data[i] && data[j+1]==next_word){ flag++; temp_word = data[i]+' '+next_word; } } // If flag more than 1 that means same word sequence found more than one time. if(flag>1){ result[k++]=temp_word; // Assign temp_word to result array. i++; // increase outer loop by one so double entry we can restrict. }else{ // If no sequence found then pass outer value to result value as it is. result[k++]=data[i]; } } console.log(result); 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM