在字符串中查找數組中的單詞彼此相鄰的位置

Question

假設我在一個字符串中有一個或兩個句子，並且我有一個單詞數組。 我需要在字符串中找到數組中兩個或多個單詞彼此相鄰的任何位置。

例子：

詞： ['cat','dog','and','the']

弦： There is a dog and cat over there. The cat likes the dog. There is a dog and cat over there. The cat likes the dog.

結果： ['dog and cat','the dog','the cat']

我能夠做到這一點的唯一方法是手動指定可能的組合，但最多只能使用 3 個單詞，因為它會很快變長。

Answer 1

您可以使用兩個指針來遍歷數組，以跟蹤words數組中包含的每個單詞序列的開頭和結尾。 這里首先將字符串轉換為刪除標點符號的小寫單詞數組（您需要擴展要刪除的字符）。

 const words = ['cat', 'dog', 'and', 'the'], string = 'There is a dog and cat over there. The cat likes the dog.'; let stringArray = string.toLowerCase().replace(/[.,]/g, '').split(' '), start = 0, end = 0, result = []; while (start < stringArray.length) { if (words.includes(stringArray[start])) { end = start + 1; while (words.includes(stringArray[end])) { end++ } if (end - start >= 2) { result.push(stringArray.slice(start, end).join(' ')); } start = end; } start++ } console.log(result)

Answer 2

這也適用於極端情況，即句子結尾和新句子開頭之間有 2 個連續單詞。 像"A cat. The watcher"這樣的東西不會匹配，因為從技術上講，它們不是連續的詞。 它們之間有一個點。

該代碼將點視為“單詞”，首先刪除文本中的點，然后重新插入它們，兩邊都有一個空格，如" . " 。 因此，點充當句子之間的“連接詞”。 這消除了對極端情況的特殊處理，因為在 2 個單詞之間有一個點，意味着它們永遠不會匹配為 2 個連續的單詞。 然后文本刪除任何多余的空格，並分成單詞：

const words = ['cat', 'dog', 'and', 'the']
const text = 'There is a dog and cat over there. A cat. The cat likes the dog.'
const xs = text.toLowerCase().replace(/\./g," . ").replace(/ +(?= )/g,'').split(' ')

var result = []
var matched = []

xs.forEach(x => {
     if (words.includes(x))
         matched.push(x)
     else {
         if (matched.length > 1) 
            result.push(matched.join(' '))
         matched = []
     }
})

console.log(result)

Result: ['dog and cat', 'the dog', 'the cat']

Answer 3

我會用兩個減少來做到這一點：一個通過在數組中累積目標集中的連續單詞來分組，另一個拒絕空數組（運行結束）並加入連續的集合......

 const words = ['cat','dog','and','the']; const wordSet = new Set(words); // optional for O(1) lookup const string = 'There is a dog and cat over there. The cat likes the dog.'; const tokens = string.split(/[ .]+/).map(t => t.toLowerCase()); // split for space and periods, force lower case const result = tokens .reduce((acc, word) => { if (wordSet.has(word)) acc[acc.length-1].push(word); else acc.push([]); return acc; }, [[]]) .reduce((acc, run) => { if (run.length) acc.push(run.join(' ')); return acc; }, []); console.log(result);

Answer 4

這個問題可以通過“遍歷”句子來解決，從每個單詞開始並繼續每次遍歷，直到句子中的單詞不再出現在數組中。

例如，第一次迭代將從句子的第一個單詞開始，並檢查它是否在數組中。 如果不在數組中，則從第二個單詞重新開始。 如果單詞存在，檢查下一個，如果它不在數組中，則結束，如果在，則繼續。

兩個while循環允許這樣做。 使用regex.replace語句刪除存在測試中的非字母字符（例如標點符號），同時將大寫更改為小寫以進行比較：

sentenceWordArray[position].toLowerCase().replace(/[^a-z]+/g, '')

如果位置超過句子單詞數組的長度，則內部while循環中需要一個break語句來防止越界錯誤。

工作片段：

 const words = ['cat','dog','and','the']; const sentence = "There is a dog and cat over there. The cat likes the dog." function matchWordRuns(sentence, dictionary) { const sentenceWordArray = sentence.split(" "); const results = []; let position = 0; const currentSearch = []; while (position < sentenceWordArray.length) { while (dictionary.indexOf(sentenceWordArray[position].toLowerCase().replace(/[^az]+/g, '')) > -1){ currentSearch.push(sentenceWordArray[position].toLowerCase().replace(/[^az]+/g, '')); position++; if (position>=sentenceWordArray.length) { break; } } // end while word matched; if (currentSearch.length>0) { results.push(currentSearch.join(" ")); } // end if; position++; currentSearch.length=0; // empty array; } // end while, search over; return results; } // end function; console.log(matchWordRuns(sentence, words)); /* result: [ "dog and cat", "the cat", "the dog" ] */

Answer 5

與 pilchard 的想法相同，但有一些改進：

使用帶有 Unicode 字符類的正則表達式來知道“字母”是什么，以及句子在哪里結束——因此，我們不需要明確列出標點符號，它應該適用於任何語言（例如"日本語！" ，它確實沒有"." ，也不匹配[az] )
結果是由原始字符串的子字符串生成的，因此它保留了大小寫和中間標點符號（這可能是也可能不是 OP 想要的；如有必要，再次通過.toLowerCase和.replace傳遞）
Set效率（假設string和words足夠長以使其值得）
生成器功能更靈活，只是因為我不經常看到它們：P
分別處理句子，因此它不會檢測到"cat. The dog"

 const words = ['cat','dog','and','the']; const string = "There is a dog and cat over there. The cat likes the dog."; function* findConsecutive(words, string) { const wordSet = new Set(words.map(word => word.toLowerCase())); const sentences = string.split(/\s*\p{Sentence_Terminal}+\s*/u); for (const sentence of sentences) { let start = null, end; const re = /\p{Letter}+/gu; while ((match = re.exec(sentence)) !== null) { if (wordSet.has(match[0].toLowerCase())) { start ??= match.index; end = match.index + match[0].length; } else if (start !== null) { yield sentence.substring(start, end); start = null; } } if (start !== null) { yield sentence.substring(start, end); } } } console.log([...findConsecutive(words, string)]);

在字符串中查找數組中的單詞彼此相鄰的位置

問題描述

5 個解決方案

解決方案1
1 2022-06-21 23:12:33

解決方案2
0 2022-06-21 23:23:31

解決方案3
0 2022-06-21 23:37:01

解決方案4
0 2022-06-21 23:56:42

解決方案5
0 2022-06-22 00:10:10

在字符串中查找數組中的單詞彼此相鄰的位置

問題描述

5 個解決方案

解決方案1 1 2022-06-21 23:12:33

解決方案2 0 2022-06-21 23:23:31

解決方案3 0 2022-06-21 23:37:01

解決方案4 0 2022-06-21 23:56:42

解決方案5 0 2022-06-22 00:10:10

解決方案1
1 2022-06-21 23:12:33

解決方案2
0 2022-06-21 23:23:31

解決方案3
0 2022-06-21 23:37:01

解決方案4
0 2022-06-21 23:56:42

解決方案5
0 2022-06-22 00:10:10