简体   繁体   English

正则表达式匹配字符串中多个单词的开头

[英]Regex match for beginning of multiple words in string

In Javascript i want to be able to match strings that begin with a certain phrase. 在Javascript中,我希望能够匹配以某个短语开头的字符串。 However, I want it to be able to match the start of any word in the phrase, not just the beginning of the phrase. 但是,我希望它能够匹配短语中任何单词的开头,而不仅仅是短语的开头。

For example: 例如:

Phrase: "This is the best" 短语:“这是最好的”

Need to Match: "th" 需要匹配:“th”

Result: Matches Th and th 结果:匹配Th和th

EDIT: \\b works great however it proposes another issue: 编辑:\\ b工作得很好,但它提出了另一个问题:

It will also match characters after foreign ones. 在外国人之后它也会匹配字符。 For example if my string is "Männ", and i search for "n", it will match the n after Mä...Any ideas? 例如,如果我的字符串是“Männ”,并且我搜索“n”,它将匹配M之后的n ...任何想法?

"This is the best moth".match(/\bth/gi);

or with a variable for your string 或者使用字符串的变量

var string = "This is the best moth";
alert(string.match(/\bth/gi));

\\b in a regex is a word boundary so \\bth will only match a th that at the beginning of a word. \\b在一个正则表达式就是一个字边界, \\bth将只匹配一个th在单词的开头说。

gi is for a global match (look for all occurrences) and case insensitive gi用于全局匹配(查找所有出现的内容)和不区分大小写

(I threw moth in there to as a reminder to check that it is not matched) (我把moth扔在那里作为提醒,检查它是不匹配的)

jsFiddle example jsFiddle例子


Edit: 编辑:

So, the above only returns the part that you match ( th ). 所以,上面只返回你匹配的部分( th )。 If you want to return the entire words, you have to match the entire word. 如果要返回整个单词,则必须匹配整个单词。

This is where things get tricky fast. 这是事情变得棘手的地方。 First with no HTML entity letter: 首先没有HTML实体字母:

string.match(/\bth[^\b]*?\b/gi);

Example

To match the entire word go from the word boundary \\b grab the th followed by non word boundaries [^\\b] until you get to another word boundary \\b . 要匹配整个单词,请从单词边界\\b抓取th后跟非单词边界[^\\b]直到找到另一个单词边界\\b The * means you want to look for 0 or more of the previous (non word boundaries) the ? *表示你想要查找前面的0个或多个(非单词边界) ? mark means that this is a lazy match. mark表示这是一个懒惰的匹配。 In other words it doesn't expand to as big as would be possible, but stops at the first opportunity. 换句话说,它不会扩大到尽可能大,但在第一次机会时停止。

If you have HTML entity characters like ä ( ä ) things get complicated really fast, and you have to use whitespace or whitespace and a set of defined characters that may be at word boundaries. 如果你有像ä( ä )这样的HTML实体字符ä事情变得非常复杂,你必须使用空格或空格以及一组可能在字边界处定义的字符。

string.match(/\sth[^\s]*|^th[^\s]*/gi);

Example with HTML entities. HTML实体的示例。

Since we're not using word boundaries, we have to take care of the beginning of the string separately ( |^ ). 由于我们没有使用单词边界,我们必须单独处理字符串的开头( |^ )。

The above will capture the white space at the beginning of words. 以上将捕获单词开头的空白区域。 Using \\b will not capture white space, since \\b has no width. 使用\\b不会捕获空格,因为\\b没有宽度。

Use this: 用这个:

string.match(/^th|\sth/gi);

Examples: 例子:

'is this is a string'.match(/^th|\sth/gi);


'the string: This is a string'.match(/^th|\sth/gi);

Results: 结果:

["th", " Th"] [“th”,“Th”]

["th"] [ “TH”]

var matches = "This is the best".match(/\bth/ig);

returns: 收益:

["Th", "th"]

The regular expression means: Match "th" ignoring case and globally (meaning, don't stop at just one match) if "th" is the first word in the string or if "th" is preceded by a space character. 正则表达式意味着:如果“th”是字符串中的第一个单词或者如果“th”前面有空格字符,则匹配“th”忽略大小写和全局(意思是,不要仅停留在一个匹配项)。

Use the g flag in the regex. 在正则表达式中使用g标志。 It stands for "global", I think, and it searches for all matches instead of only the first one. 我认为它代表“全球”,它会搜索所有匹配而不是第一个匹配。

You should also use the i flag for case-insensitive matching. 您还应该使用i标志进行不区分大小写的匹配。

You add flags to the end of the regex ( /<regex>/<flags> ) or as a second parameter to new RegExp(pattern, flags) 您将标志添加到正则表达式的末尾( /<regex>/<flags> )或作为new RegExp(pattern, flags)的第二个参数new RegExp(pattern, flags)

For instance: 例如:

var matches = "This is the best".match(/\bth/gi);

or, using RegExp objects: 或者,使用RegExp对象:

var re = new RegExp("\\bth", "gi");
var matches = re.exec("This is the best");

EDIT: Use \\b in the regex to match the b oundary of a word. 编辑:使用\\b的正则表达式匹配一个字中的B oundary。 Note that it does not really match any specific character, but the beginning or end of a word or the string. 请注意,它并不真正匹配任何特定字符,而是字或字符串的开头或结尾。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM