简体   繁体   中英

How to get substring(s) of a string if said substring(s) are between two specific characters

I am wondering how to extract words (substrings) from a string, if said strings are between two specific characters. In my case, I am looking for the start character to be a white space and the final character to be a comma like so:

var str = "Hit that thing man! and a one, two, three, four, five, six, seven or eight";

Result:

var result = ["one", "two", "three", "four", "five", "six", "seven", "eight"];

I am wondering if a regex is possible, or perhaps good old javascript will be the straight forward solution.

I have tried the following so far:

var result = str.split(/[,\s]+/);

But to no avail since it does the following behavior incorrectly:

  1. Grabs the entire string before one .
  2. Grabs the space before the desired letter.

Bonus round : Can I include the last letter eight in the result by adding to the desired regex/javascript solution?

Any help is very appreciated!

TLDR: regex101.com

Why not just get all matches ? It seems simple than spliting the stuff.

var re = /(?:^|\s)([^,\s]+)(?:,|$| or)/g,
    s = "Hit that thing man! and a one, two, three, four, five, six, seven or eight",
    m,
    matches = [];

// Matches once and then as long as there are some matches
do {
    m = re.exec(s);
    if (m) {
        matches.push(m[1]);
    }
} while (m);

console.log(m);

This produces:

["one", "two", "three", "four", "five", "six", "seven", "eight"]

If you don't want to match on or , just remove it:

/(?:^|[\s])([^,\s]+)(?:,|$)/g

And you can also add and which often appears instead of or in such lists:

/(?:^|[\s])([^,\s]+)(?:,|$|| or|)/g

The ^ and $ allow to match at the beginning and end of string.

str.match(/\b[A-z]+(?=(, )|( or )|$)/g)

It matches a word from its start if this word is followed by a comma, the word "or" or the end of the text.

You can try it here .

The final or is the only actual problem, because JavaScript does not support lookbehinds. For that reason you cannot use a single regex to capture words "between two specific characters" - you always end up with at least the left one in your result.

I come up with this: mangle the string into form by replacing or with a comma and adding one to the end. Then it's a straightforward regex:

var result = str.concat(',').replace(' or ',',').match(/\w+(?=,)/g);

It cannot work with split because that would assign the entire first part of the sentence to one .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM