简体   繁体   中英

Regex match for beginning of multiple words in string

In Javascript i want to be able to match strings that begin with a certain phrase. However, I want it to be able to match the start of any word in the phrase, not just the beginning of the phrase.

For example:

Phrase: "This is the best"

Need to Match: "th"

Result: Matches Th and th

EDIT: \\b works great however it proposes another issue:

It will also match characters after foreign ones. For example if my string is "Männ", and i search for "n", it will match the n after Mä...Any ideas?

"This is the best moth".match(/\bth/gi);

or with a variable for your string

var string = "This is the best moth";
alert(string.match(/\bth/gi));

\\b in a regex is a word boundary so \\bth will only match a th that at the beginning of a word.

gi is for a global match (look for all occurrences) and case insensitive

(I threw moth in there to as a reminder to check that it is not matched)

jsFiddle example


Edit:

So, the above only returns the part that you match ( th ). If you want to return the entire words, you have to match the entire word.

This is where things get tricky fast. First with no HTML entity letter:

string.match(/\bth[^\b]*?\b/gi);

Example

To match the entire word go from the word boundary \\b grab the th followed by non word boundaries [^\\b] until you get to another word boundary \\b . The * means you want to look for 0 or more of the previous (non word boundaries) the ? mark means that this is a lazy match. In other words it doesn't expand to as big as would be possible, but stops at the first opportunity.

If you have HTML entity characters like ä ( ä ) things get complicated really fast, and you have to use whitespace or whitespace and a set of defined characters that may be at word boundaries.

string.match(/\sth[^\s]*|^th[^\s]*/gi);

Example with HTML entities.

Since we're not using word boundaries, we have to take care of the beginning of the string separately ( |^ ).

The above will capture the white space at the beginning of words. Using \\b will not capture white space, since \\b has no width.

Use this:

string.match(/^th|\sth/gi);

Examples:

'is this is a string'.match(/^th|\sth/gi);


'the string: This is a string'.match(/^th|\sth/gi);

Results:

["th", " Th"]

["th"]

var matches = "This is the best".match(/\bth/ig);

returns:

["Th", "th"]

The regular expression means: Match "th" ignoring case and globally (meaning, don't stop at just one match) if "th" is the first word in the string or if "th" is preceded by a space character.

Use the g flag in the regex. It stands for "global", I think, and it searches for all matches instead of only the first one.

You should also use the i flag for case-insensitive matching.

You add flags to the end of the regex ( /<regex>/<flags> ) or as a second parameter to new RegExp(pattern, flags)

For instance:

var matches = "This is the best".match(/\bth/gi);

or, using RegExp objects:

var re = new RegExp("\\bth", "gi");
var matches = re.exec("This is the best");

EDIT: Use \\b in the regex to match the oundary of a word. oundary。 Note that it does not really match any specific character, but the beginning or end of a word or the string.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM