在字符串中查找数组中的单词彼此相邻的位置

Question

假设我在一个字符串中有一个或两个句子，并且我有一个单词数组。 我需要在字符串中找到数组中两个或多个单词彼此相邻的任何位置。

例子：

词： ['cat','dog','and','the']

弦： There is a dog and cat over there. The cat likes the dog. There is a dog and cat over there. The cat likes the dog.

结果： ['dog and cat','the dog','the cat']

我能够做到这一点的唯一方法是手动指定可能的组合，但最多只能使用 3 个单词，因为它会很快变长。

Answer 1

您可以使用两个指针来遍历数组，以跟踪words数组中包含的每个单词序列的开头和结尾。 这里首先将字符串转换为删除标点符号的小写单词数组（您需要扩展要删除的字符）。

 const words = ['cat', 'dog', 'and', 'the'], string = 'There is a dog and cat over there. The cat likes the dog.'; let stringArray = string.toLowerCase().replace(/[.,]/g, '').split(' '), start = 0, end = 0, result = []; while (start < stringArray.length) { if (words.includes(stringArray[start])) { end = start + 1; while (words.includes(stringArray[end])) { end++ } if (end - start >= 2) { result.push(stringArray.slice(start, end).join(' ')); } start = end; } start++ } console.log(result)

Answer 2

这也适用于极端情况，即句子结尾和新句子开头之间有 2 个连续单词。 像"A cat. The watcher"这样的东西不会匹配，因为从技术上讲，它们不是连续的词。 它们之间有一个点。

该代码将点视为“单词”，首先删除文本中的点，然后重新插入它们，两边都有一个空格，如" . " 。 因此，点充当句子之间的“连接词”。 这消除了对极端情况的特殊处理，因为在 2 个单词之间有一个点，意味着它们永远不会匹配为 2 个连续的单词。 然后文本删除任何多余的空格，并分成单词：

const words = ['cat', 'dog', 'and', 'the']
const text = 'There is a dog and cat over there. A cat. The cat likes the dog.'
const xs = text.toLowerCase().replace(/\./g," . ").replace(/ +(?= )/g,'').split(' ')

var result = []
var matched = []

xs.forEach(x => {
     if (words.includes(x))
         matched.push(x)
     else {
         if (matched.length > 1) 
            result.push(matched.join(' '))
         matched = []
     }
})

console.log(result)

Result: ['dog and cat', 'the dog', 'the cat']

Answer 3

我会用两个减少来做到这一点：一个通过在数组中累积目标集中的连续单词来分组，另一个拒绝空数组（运行结束）并加入连续的集合......

 const words = ['cat','dog','and','the']; const wordSet = new Set(words); // optional for O(1) lookup const string = 'There is a dog and cat over there. The cat likes the dog.'; const tokens = string.split(/[ .]+/).map(t => t.toLowerCase()); // split for space and periods, force lower case const result = tokens .reduce((acc, word) => { if (wordSet.has(word)) acc[acc.length-1].push(word); else acc.push([]); return acc; }, [[]]) .reduce((acc, run) => { if (run.length) acc.push(run.join(' ')); return acc; }, []); console.log(result);

Answer 4

这个问题可以通过“遍历”句子来解决，从每个单词开始并继续每次遍历，直到句子中的单词不再出现在数组中。

例如，第一次迭代将从句子的第一个单词开始，并检查它是否在数组中。 如果不在数组中，则从第二个单词重新开始。 如果单词存在，检查下一个，如果它不在数组中，则结束，如果在，则继续。

两个while循环允许这样做。 使用regex.replace语句删除存在测试中的非字母字符（例如标点符号），同时将大写更改为小写以进行比较：

sentenceWordArray[position].toLowerCase().replace(/[^a-z]+/g, '')

如果位置超过句子单词数组的长度，则内部while循环中需要一个break语句来防止越界错误。

工作片段：

 const words = ['cat','dog','and','the']; const sentence = "There is a dog and cat over there. The cat likes the dog." function matchWordRuns(sentence, dictionary) { const sentenceWordArray = sentence.split(" "); const results = []; let position = 0; const currentSearch = []; while (position < sentenceWordArray.length) { while (dictionary.indexOf(sentenceWordArray[position].toLowerCase().replace(/[^az]+/g, '')) > -1){ currentSearch.push(sentenceWordArray[position].toLowerCase().replace(/[^az]+/g, '')); position++; if (position>=sentenceWordArray.length) { break; } } // end while word matched; if (currentSearch.length>0) { results.push(currentSearch.join(" ")); } // end if; position++; currentSearch.length=0; // empty array; } // end while, search over; return results; } // end function; console.log(matchWordRuns(sentence, words)); /* result: [ "dog and cat", "the cat", "the dog" ] */

Answer 5

与 pilchard 的想法相同，但有一些改进：

使用带有 Unicode 字符类的正则表达式来知道“字母”是什么，以及句子在哪里结束——因此，我们不需要明确列出标点符号，它应该适用于任何语言（例如"日本語！" ，它确实没有"." ，也不匹配[az] )
结果是由原始字符串的子字符串生成的，因此它保留了大小写和中间标点符号（这可能是也可能不是 OP 想要的；如有必要，再次通过.toLowerCase和.replace传递）
Set效率（假设string和words足够长以使其值得）
生成器功能更灵活，只是因为我不经常看到它们：P
分别处理句子，因此它不会检测到"cat. The dog"

 const words = ['cat','dog','and','the']; const string = "There is a dog and cat over there. The cat likes the dog."; function* findConsecutive(words, string) { const wordSet = new Set(words.map(word => word.toLowerCase())); const sentences = string.split(/\s*\p{Sentence_Terminal}+\s*/u); for (const sentence of sentences) { let start = null, end; const re = /\p{Letter}+/gu; while ((match = re.exec(sentence)) !== null) { if (wordSet.has(match[0].toLowerCase())) { start ??= match.index; end = match.index + match[0].length; } else if (start !== null) { yield sentence.substring(start, end); start = null; } } if (start !== null) { yield sentence.substring(start, end); } } } console.log([...findConsecutive(words, string)]);

在字符串中查找数组中的单词彼此相邻的位置

问题描述

5 个解决方案

解决方案1
1 2022-06-21 23:12:33

解决方案2
0 2022-06-21 23:23:31

解决方案3
0 2022-06-21 23:37:01

解决方案4
0 2022-06-21 23:56:42

解决方案5
0 2022-06-22 00:10:10

在字符串中查找数组中的单词彼此相邻的位置

问题描述

5 个解决方案

解决方案1 1 2022-06-21 23:12:33

解决方案2 0 2022-06-21 23:23:31

解决方案3 0 2022-06-21 23:37:01

解决方案4 0 2022-06-21 23:56:42

解决方案5 0 2022-06-22 00:10:10

解决方案1
1 2022-06-21 23:12:33

解决方案2
0 2022-06-21 23:23:31

解决方案3
0 2022-06-21 23:37:01

解决方案4
0 2022-06-21 23:56:42

解决方案5
0 2022-06-22 00:10:10