简体   繁体   中英

JavaScript Regex replace words with their first letter except when within parentheses

I am looking for JavaScript Regex that will replace the words in a block of text with only the first letter of each word BUT if there are words within brackets, leave them intact with the brackets. The purpose is to create a mnemonic device for remembering lines from a screenplay or theatrical script. I want the actual lines reduced to first letters but the stage directions (in parenthesis) to be unaltered.

For example:

Test test test (test). Test (test test) test test.

Would yield the result:

T t t (test). T (test test) t t.

Using:

 .replace(/(\w)\w*/g,'$1')

Yields:

T t t (t). T (t t) t t.

My understanding of regex is very poor. I have been researching it for several days now and have tried many things but can't seem to wrap my head around a solution.

You can accomplish this with a small tweak to your regex:

/(\w|\([^)]+\))\w*/

The added part \\([^)]+\\) matches everything inside two pairs of parenthesis.

"Test test test (test). Test (test test) test test.".replace(/(\w|\([^)]+\))\w*/g,'$1')
>"T t t (test). T (test test) t t."

Edit: to address issues raised in the comments

"Test test test (test). Test (test. test.) test test. test(test) (test)test".replace(/(\w|\([^)]+)\w*/g,'$1')
>"T t t (test). T (test. test.) t t. t(test) (test)t"

In this kind of case, there are three approaches:

  1. Use a regexp to find everything you want to keep, then paste all those pieces together.

  2. Use a regexp to find things you don't want to keep, then throw those away, by replacing them (which is what some other answers have done).

  3. Parse the string yourself, as one answer suggests.

We will consider regexp solutions. The key to writing regexps is to write down a narrative description of exactly what you want it to do. Then transform that into actual regexp syntax. Otherwise, your eyes will start bleeding as you randomly try one thing or another.

To find what you want to keep, the narrative description is:

Any parenthesized string (including preceding spaces) or space (or beginning of string) followed by a single letter, or punctuation.

To turn this into a regexp:

including preceding spaces:   \s*
any parenthesized string:     \(.*?\)
or:                           |
space or beginning of string: (^|\s+)
any letter:                   \w
punctuation:                  [.]

So the relevant regexp is /\\s*\\(.*?\\)|(^|\\s+)\\w|[.]/ .

>> parts = str.match(/\s*\(.*?\)|(^|\s+)\w/g);
<< ["T", " t", " t", " (test)", ".", " T", " (test test)", " t", " t", "."]

>> parts.join('')
<< "T t t (test). T (test test) t t."

If you want to adopt the opposite approach, which is to find the pieces you don't want to keep, for replacement by the empty string, then the narrative is

Any letter which is preceded by another letter, unless coming earlier there is an opening parentheses with no intervening closing parenthesis.

The problem here is the unless coming earlier part, which in regexp terms is what is called a negative look-behind; that is not supported by the JS flavor of regexp.

This why some of the other answers use the technique of a regexp which says "(1) first letter or entire sequence of parenthesized expression, (2) followed by more letters", and capture the (1) part. Then replace the entire string with (1), using the $1 back-reference, which has the effect of removing (2). That also works fine.

In other words, to throw away an A if preceded by a B , they match on (B)A and then replace the entire match with B .

Using split

For completeness, you could also consider the technique of splitting on spaces and punctuation and parenthesized expressions:

str = "Test (test). test";

>> pieces = str.split(/(\(.*?\)|\s+|[.])/);
<< ["Test", " ", "", "(test)", "", ".", "", " ", "test"]

// Remove empty strings
>> pieces = pieces . filter(Boolean)
<< ["Test", " ", "(test)", ".", " ", "test"]

// Take first letter if not parenthesized
>> pieces = pieces . map(function(piece) {
     return piece[0] === '(' ? piece : piece[0];
    });
<< ["T", " ", "(test)", ".", " ", "t"]

// Join them back together
>> pieces . join('')
<< "T (test). t"

The entire solution thus becomes

function abbreviate_words_outside_parentheses(str) {
  return str .
    split(/(\(.*?\)|\s+|[.])/) .
    filter(Boolean) .
    map(function(piece) { return piece[0] === '(' ? piece : piece[0];  }) .
    join('')
  ;
}

This procedural approach might be preferable if you think you may want to be doing additional kinds of transformations in the future, which might be hard to handle using the regexp.

To keep the regular expression simple, you could use the callback mechanism to keep track of the opening and closing parentheses:

 var t = 'Test test test (test). Test (test test) test test.'; // keep track of open state and last index var s = { open: false, index: 0 }; var res = t.replace(/\\w+/g, function($0, index) { // update state for (var i = s.index; i < index; ++i) { if (t[i]=='(' || t[i] == ')') { s.open = !s.open; // assume balanced parentheses } } s.index = index; // return first letter if outside of parentheses return s.open ? $0 : $0[0]; }); console.log(res); 

You need to use capturing groups and lookahead assertions inorder to achieve the result you expected.

> "Test test test (test). Test (test test) test test".replace(/(^[^\s(]|\s[^\s(])[^()\s]*(?=\s|$)/g, "$1")
'T t t (test). T (test test) t t'

DEMO

  • (^[^\\s(]|\\s[^\\s(]) captures the first letter of each word which must not be a space or ( .

  • [^()\\s]* matches any character but not of ( or ) or space.

  • (?=\\s|$) positive lookahead asserts that the match must be followed by a space or end of the line anchor which in-turn mean that we matches a complete word.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM