提取包含換行符之間匹配的文本

Question

如果該段落包含使用 JS 的關鍵搜索詞，我正在嘗試從 OCR 合同中提取段落。 用戶可能會搜索諸如“提前發貨”之類的內容，以查找與特定客戶訂單是否可以提前發貨相關的條款。

很長一段時間以來，我一直把頭撞在正則表達式牆上，顯然只是沒有抓住一些東西。

如果我有這樣的文字並且我正在搜索“匹配”這個詞：

let text = "\n\nThis is an example of a paragraph that has the word I'm looking for The word is Match. \n\nThis paragraph does not have the word I want."

我想提取雙 \n 字符之間的所有文本，而不是返回該字符串中的第二個句子。

我一直在嘗試某種形式：

let string = `[^\n\n]*match[^.]*\n\n`;

let re = new RegExp(string, "gi");
let body = text.match(re);

但是，這會返回 null。 奇怪的是，如果我從它工作的字符串中刪除句點（排序）：

[
  "This is an example of a paragraph that has the word I'm looking for The word is Match \n" +
    '\n'
]

任何幫助都是極好的。

Answer 1

如果沒有與上下文匹配相關的任何技巧，在包含某些特定文本的相同分隔符之間提取一些文本是不太可能的。

因此，您可以簡單地將文本分成段落並獲取包含匹配項的段落：

const results = text.split(/\n{2,}/).filter(x=>/\bmatch\b/i.test(x))

如果您不需要整個單詞匹配，您可以刪除單詞邊界。

請參閱 JavaScript 演示：

 let text = "\n\nThis is an example of a paragraph that has the word I'm looking for The word is Match. \n\nThis paragraph does not have the word I want."; console.log(text.split(/\n{2,}/).filter(x=>/\bmatch\b/i.test(x)));

Answer 2

如果您使用. 默認匹配除換行符以外的所有字符。 使用正則表達式/.*match.*/都有一個貪婪的.* ：

 const text = 'aaaa\n\nbbb match ccc\n\nddd'; const regex = /.*match.*/; console.log(text.match(regex).toString());

Output：

bbb match ccc

Answer 3

這是兩種方法。 我不知道為什么你需要使用正則表達式。 拆分似乎更容易做到，不是嗎？

 const text = "\n\nThis is an example of a paragraph that has the word I'm looking for The word is Match. \n\nThis paragraph does not have the word I want." // regular expression one function getTextBetweenLinesUsingRegex(text) { const regex = /\n\n([^(\n\n)]+)\n\n/; const arr = regex.exec(text); if (arr.length > 1) { return arr[1]; } return null; } console.log(`getTextBetweenLinesUsingRegex: ${ getTextBetweenLinesUsingRegex(text)}`); console.log(`simple: ${text.split('\n\n')[1]}`);

提取包含換行符之間匹配的文本

問題描述

3 個解決方案

解決方案1
1 已采納 2021-02-10 23:51:00

解決方案2
1 2021-02-11 00:15:47

解決方案3
-1 2021-02-10 23:52:47

提取包含換行符之間匹配的文本

問題描述

3 個解決方案

解決方案1 1 已采納 2021-02-10 23:51:00

解決方案2 1 2021-02-11 00:15:47

解決方案3 -1 2021-02-10 23:52:47

解決方案1
1 已采納 2021-02-10 23:51:00

解決方案2
1 2021-02-11 00:15:47

解決方案3
-1 2021-02-10 23:52:47