简体   繁体   中英

Replace quotes (but not touch nested quotes)

Got problems with next issue: I need to replace quotes with angle quotes, but if sentence got quotes again - the shouldnt be replaced.

So to get open quote I use next:

const regexStartQuote = /"(?=\S)/gm;
const replaceStartQuote = '«'

to replace quote with closing one I use:

// const regexEndQuote = /(?<=\S)"/gm; // not supported in Mozilla
const regexEndQuote = /"(?=\s)/gm;
const replaceEndQuote = '»'

And this works. I mean: "Some text" -> «Some text»

Btw I work with draftjs and this changes applied on fly.

And I need to extend existing regex`s so if the sentence should be something like:

«Some text "Text in quotes" something more»

And, of course possible variants like:

«Some text "Text in quotes", something more»

«Some text: "Text in quotes", something more»

«Some text: "Text in quotes",- something more»

UPDATE

The flow of program is next: Each symbol that is typed is merged with string. I mean, first when eg textblock is empty

the string is just `` (empty),

then user type 'w' -> string become w ,

then 'o' -> string wo ,

then 'w' -> string is wow ,

then ' ' (space) -> string is wow ,

then " -> string is wow «

and so on

As I understand, regex should be something like:

`If user typed " and there is no » before it but we have « we shouldnt change ".

Try this solution

const startRegex = /^"/gm;
const endRegex = /"$/gm;

str.replace(startRegex, "<<")

str.replace(endRegex, ">>")

 const startRegex = /^"/gm; const endRegex = /"$/gm; const str = `"Some text "Text in quotes" something more"` let result = str.replace(startRegex, "<<") result = result.replace(endRegex, ">>") console.log(result);

This handles the nesting of quoted strings that occur on a line boundary (the quoted string itself does not have to begin and end at the start and end of the line). This is somewhat artificial, but if you want to allow for multiple internal quoted strings within the outer quoted string, then this almost becomes a necessity. This would be the problem. Consider the following string:

var s = '"This is an "internal quote" within a sentence." A short sentence.\n' +
        '"Another quoted sentence."\n' +
        '"Yet another quoted sentence."' +
        'etc.';

What prevents " A short sentence.\n" and "\n" , for example, from being recognized as internal quoted strings? In other words, it becomes impossible to tell when a quote signifies the end of the outer quoted string or the start of a new internal quoted string (at least until you get to the end of the entire input).

The regex: ^([^"\n]*)"((?:[^"\n]*"[^"\n]*")*[^"\n]*)"([^*\n]*)$

  1. ^ Matches the start of the line.
  2. ([^"\n]*) Capture group 1: 0 or more characters that match anything other than " or newline. This is everything on the line that might precede the opening quote.
  3. " Matches the opening quote. Now we will be looking for optional quoted strings withing the outer quotes
  4. (?:[^"\n]*"[^"\n]*") A non-capturing group that looks for 0 or more non-quote/non-newline characters followed by a quote followed by 0 or more non-quote/non-newline characters followed by a quote. This would be an internal quoted string.
  5. ((?:[^"\n]*"[^"\n]*"))* The above pattern can be repeated 0 or more times.
  6. [^"\n]*" Matches 0 or more non-quote/non-newline characters followed by a quote. This takes care of matching the rest of the quoted string.
  7. ([^*\n]*) Matches the rest of the line (0 or more characters), which should not include a quote.

正则表达式可视化

The above regex is fairly complicated because it checks for balanced quotes. If you do not care to do such rigid checking, then a simpler regex that only looks for the first and last quotes on a line would be (and the rest of the code stays the same):

/^([^"\n]*)"([^\n]*)"([^"\n])*$/gm;

 var s = 'A plain line.\n' + 'This is "Some text in quotes" and some without.\n' + '"This has "quotes within quotes" and some without."\n' + '"This has "many" "quoted" "strings" within quotes."'; var regex = /^([^"\n]*)"((?:[^"\n]*"[^"\n]*")*[^"\n]*)"([^*\n]*)$/gm; console.log(s.replace(regex, "$1«$2»$3"));

Update

To modify input, s , as it is entered, you need to test against several regular expressions:

  1. If input matches /^[^"\n]*$/ (no quote on line), then no replacement necessary.
  2. If input matches /^[^«\n]*«([^»\n]*»)?[^"\n]*$/ , then no replacement necessary.
  3. If input matches /^([^"«\n]*)"$/ (first quote seen), then s = s.replace('"', '«');
  4. If input matches /^([^"«\n]*)«([^\n]*)"$/ (other than first quote seen), then s = s.replace('»', '"'); s = s.replace(/"$/, '»');

Code snippets don't seem to allow true one-character-at-a-time input, but this one simulates what it would look like:

 function test(str) { let s = ''; for (let i = 0; i < str.length; i++) { key = str.charAt(i); s += key; if (/^[^"\n]*$/.test(s) || /^[^«\n]*«([^»\n]*»)?[^"\n]*$/.test(s)); else if (/^([^"«\n]*)"$/.test(s)) s = s.replace('"', '«'); else if (/^([^"«\n]*)«([^\n]*)"$/.test(s)) { s = s.replace('»', '"'); s = s.replace(/"$/, '»'); } console.log("\n" + s); } } test('a"bc"de"fg"h"ij"');

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM