如何使用正則表達式使用Javascript替換字符串中特定單詞之外的所有內容

Question

想象一下，你有一個像這樣的字符串：“這是一個帶有單詞的句子。”

我有一些單詞，比如$wordList = ["sentence", "words"];

我想強調一下列表中沒有的單詞。 這意味着我需要找到並替換其他所有內容，我似乎無法用RegEx來解決如何做到這一點（如果可能的話）。

如果我想匹配我可以做的事情：

text = text.replace(/(sentence|words)\\b/g, '$&');

（它會將匹配的單詞包裝在“mark”標簽中，並假設我有一些 css，突出顯示它們），它們完美無缺。 但我需要相反的！ 我需要它基本上選擇整個字符串，然后排除列出的單詞。 我已經嘗試了/^((?!sentence|words)*)*$/gm但這給了我一個奇怪的無限問題因為我認為它太開放了。

拿這個原始句子，我希望最終得到的是" This is a sentence with some words."

基本上包裝（通過替換）除列出的單詞之外的所有內容。

我似乎得到的最接近的是/^(?!sentence|words).*\\b/igm ，如果一行以其中一個單詞開頭（忽略整行），它將成功地執行此操作。

總結一下：1）取一個字符串2）取一個單詞列表3）替換字符串中的所有內容，除了單詞列表。

可能？ （jQuery已經加載了其他東西，所以原始JS或jQuery都可以接受）。

Answer 1

從單詞列表創建正則表達式。
然后用正則表達式替換字符串。
（這是一個棘手的正則表達式）

 var wordList = ["sentence", "words"]; // join the array into a string using '|'. var str = wordList.join('|'); // finalize the string with a negative assertion str = '\\\\W*(?:\\\\b(?!(?:' + str + ')\\\\b)\\\\w+\\\\W*|\\\\W+)+'; //create a regex from the string var Rx = new RegExp( str, 'g' ); console.log( Rx ); var text = "%%%555This is a sentence with words, but not sentences ?!??!!..."; text = text.replace( Rx, '<mark>$&</mark>'); console.log( text );

產量

/\W*(?:\b(?!(?:sentence|words)\b)\w+\W*|\W+)+/g
<mark>%%%555This is a </mark>sentence<mark> with </mark>words<mark>, but not sentences ?!??!!...</mark>

附錄

上面的正則表達式假定單詞列表僅包含單詞字符。
如果不是這種情況，您必須匹配單詞以提前匹配位置
過去他們。 使用簡化的正則表達式和回調函數可以輕松完成此操作。

 var wordList = ["sentence", "words", "won't"]; // join the array into a string using '|'. var str = wordList.join('|'); str = '([\\\\S\\\\s]*?)(\\\\b(?:' + str + ')\\\\b|$)'; //create a regex from the string var Rx = new RegExp( str, 'g' ); console.log( Rx ); var text = "%%%555This is a sentence with words, but won't be sentences ?!??!!..."; // Use a callback to insert the 'mark' text = text.replace( Rx, function(match, p1,p2) { var retStr = ''; if ( p1.length > 0 ) retStr = '<mark>' + p1 + '</mark>'; return retStr + p2; } ); console.log( text );

產量

/([\S\s]*?)(\b(?:sentence|words|won't)\b|$)/g
<mark>%%%555This is a </mark>sentence<mark> with </mark>words<mark>, but 
</mark>won't<mark> be sentences ?!??!!...</mark>

Answer 2

您仍然可以在肯定匹配上執行替換，但是反轉關閉/打開標記，並在開頭添加一個開始標記，在字符串末尾添加一個結束標記。 我在這里使用你的正則表達式，這可能是你想要的任何東西，所以我認為它正確匹配需要匹配的東西：

 var text = "This is a sentence with words."; text = "<mark>" + text.replace(/\\b(sentence|words)\\b/g, '</mark>$&<mark>') + "</mark>"; // If empty tags bother you, you can add: text = text.replace(/<mark><\\/mark>/g, ""); console.log(text);

時間復雜性

在下面的評論中，有人指出第二次替換（可選）是浪費時間。 但它具有線性時間復雜度，如下面的片段所示，該片段描述了增加字符串大小的持續時間。

X軸表示輸入字符串中的字符數，Y軸表示在此類輸入字符串上使用/<\\/mark>/g執行替換所需的毫秒數：

 // Reserve memory for the longest string const s = '<mark></mark>' + '<mark>x</mark>'.repeat(2000); regex = /<mark><\\/mark>/g, millisecs = {}; // Collect timings for several string sizes: for (let size = 100; size < 25000; size+=100) { millisecs[size] = test(15, 8, _ => s.substr(0, size).replace(regex, '')); } // Show results in a chart: chartFunction(canvas, millisecs, "len", "ms"); // Utilities function test(countPerRun, runs, f) { let fastest = Infinity; for (let run = 0; run < runs; run++) { const started = performance.now(); for (let i = 0; i < countPerRun; i++) f(); // Keep the duration of the fastest run: fastest = Math.min(fastest, (performance.now() - started) / countPerRun); } return fastest; } function chartFunction(canvas, y, labelX, labelY) { const ctx = canvas.getContext('2d'), axisPix = [40, 20], largeY = Object.values(y).sort( (a, b) => b - a )[ Math.floor(Object.keys(y).length / 10) ] * 1.3; // add 30% to value at the 90th percentile max = [+Object.keys(y).pop(), largeY], coeff = [(canvas.width-axisPix[0]) / max[0], (canvas.height-axisPix[1]) / max[1]], textAlignPix = [-8, -13]; ctx.translate(axisPix[0], canvas.height-axisPix[1]); text(labelY + "/" + labelX, [-5, -13], [1, 1], false, 2); // Draw axis lines for (let dim = 0; dim < 2; dim++) { const c = coeff[dim], world = [c, 1]; let interval = 10**Math.floor(Math.log10(60 / c)); while (interval * c < 30) interval *= 2; if (interval * c > 60) interval /= 2; let decimals = ((interval+'').split('.')[1] || '').length; line([[0, 0], [max[dim], 0]], world, dim); for (let x = 0; x <= max[dim]; x += interval) { line([[x, 0], [x, -5]], world, dim); text(x.toFixed(decimals), [x, textAlignPix[1-dim]], world, dim, dim+1); } } // Draw function line(Object.entries(y), coeff); function translate(coordinates, world, swap) { return coordinates.map( p => { p = [p[0] * world[0], p[1] * world[1]]; return swap ? p.reverse() : p; }); } function line(coordinates, world, swap) { coordinates = translate(coordinates, world, swap); ctx.beginPath(); ctx.moveTo(coordinates[0][0], -coordinates[0][1]); for (const [x, y] of coordinates.slice(1)) ctx.lineTo(x, -y); ctx.stroke(); } function text(s, p, world, swap, align) { // align: 0=left,1=center,2=right const [[x, y]] = translate([p], world, swap); ctx.font = '9px courier'; ctx.fillText(s, x - 2.5*align*s.length, 2.5-y); } }

 <canvas id="canvas" width="600" height="200"></canvas>

對於每個字符串大小（以100個字符為步長遞增），測量運行正則表達式15次的時間。 該測量重復8次，並且在圖中報告最快運行的持續時間。 在我的電腦上，正則表達式在25μs的字符串上運行25000個字符（由標簽組成）。 所以不用擔心;-)

您可能會在圖表中看到一些峰值（由於瀏覽器和操作系統干擾），但整體趨勢是線性的。 鑒於主正則表達式具有線性時間復雜度，總體時間復雜度不會受到它的負面影響。

但是，可選的部分可以在沒有正則表達式的情況下執

if (text.substr(6, 7) === '</mark>') text = text.substr(13);
if (text.substr(-13, 6) === '<mark>') text = text.substr(0, text.length-13);

由於JavaScript引擎如何處理字符串（不可變），這個較長的代碼在恆定時間內運行。

當然，它不會改變整體時間復雜度，而復雜性仍然是線性的。

Answer 3

我不確定這是否適用於所有情況，但對於給定的字符串它。

 let s1 = "This is a sentence with words."; let wordList = ["sentence", "words"]; let reg = new RegExp("([\\\\s\\\\S]*?)(" + wordList.join("|") + ")", "g"); console.log(s1.replace(reg, "<mark>$1</mark>$2"))

Answer 4

以相反的方式做到：標記所有內容並取消標記您擁有的匹配單詞。

text = `<mark>${text.replace(/\b(sentence|words)\b/g, '</mark>$&<mark>')}</mark>`;

否定的正則表達式是可能的，但效率低下。 事實上，正則表達式不是正確的工具。 可行的方法是遍歷字符串並手動構造結束字符串：

//var text = "This is a sentence with words.";
//var wordlist = ["sentence", "words"];
var result = "";
var marked = false;
var nextIndex = 0;

while (nextIndex != -1) {
    var endIndex = text.indexOf(" ", nextIndex + 1);
    var substring = text.slice(nextIndex, endIndex == -1 ? text.length : endIndex);
    var contains = wordlist.some(word => substring.includes(word));
    if (!contains && !marked) {
        result += "<mark>";
        marked = true;
    }
    if (contains && marked) {
        result += "</mark>";
        marked = false;
    }
    result += substring;
    nextIndex = endIndex;
}

if (marked) {
    result += "</mark>";
}
text = result;

如何使用正則表達式使用Javascript替換字符串中特定單詞之外的所有內容

問題描述

4 個解決方案

解決方案1
5 已采納 2017-08-18 23:29:41

解決方案2
3 2017-08-18 23:52:19

時間復雜性

解決方案3
1 2017-08-18 23:29:52

解決方案4
1 2017-12-14 22:34:48

如何使用正則表達式使用Javascript替換字符串中特定單詞之外的所有內容

問題描述

4 個解決方案

解決方案1 5 已采納 2017-08-18 23:29:41

解決方案2 3 2017-08-18 23:52:19

時間復雜性

解決方案3 1 2017-08-18 23:29:52

解決方案4 1 2017-12-14 22:34:48

解決方案1
5 已采納 2017-08-18 23:29:41

解決方案2
3 2017-08-18 23:52:19

解決方案3
1 2017-08-18 23:29:52

解決方案4
1 2017-12-14 22:34:48