简体   繁体   中英

Regular expression: match a word of certain length which starts with certain letters

I need a regex which matches a 7 letter word, which starts with 'st' . For example, it should only match 'startin' out of the following: start startin starting

General tips:

  • The starting symbols are included into the regex directly, eg st . If the starting characters are special in the sense of regex-syntax (like dots, parentheses, etc.), you need to escape them with a backslash, but it is not needed in your case.

  • After the starting symbols, include character class for the remaining characters of your "word". If you want to allow all characters, use a dot: . . If you want to allow all non-whitespace characters, use \\S . If you want to allow only (unicode) letters, use \\p{L} . To only allow non-accented latin letters, use [A-Za-z] . There are many possibilities here.

  • Finally, include repetition quantifier for the character class from the previous step. In you case, you need exactly 5 characters after st , so the repetition quantifier is {5} .

  • If you want only the whole string to match, use \\A at the beginning and \\z at the end of your regex. Or include \\b at the beginning/end of your regex to match at the so-called word boundaries (including start/end of the string, whitespace, punctuation). The most powerful alternative (with full control) is the so-called lookahead - I'll leave it out here for the sake of simplicity.

See this tutorial for details. You can just look for specific keywords I've mentioned, eg repetition , character class , unicode , lookahead , etc.

To match words with non-accent characters that are case insensitive you'll need the i modifier or you'll need to declare both letters at the beginning in both cases.

<?php

    $regex = '!\bst[a-z]{5}\b!i';
    $words = "start startin starting station Stalker SHOWER Staples Stiffle Steerin StÄbles'";
    preg_match_all($regex,$words,$matches);
    print_r($matches[0]);
?>

Output

Array
(
    [0] => startin
    [1] => station
    [2] => Stalker
    [3] => Staples
    [4] => Stiffle
    [5] => Steerin
)

With the same output as above, if you didn't use the i modifier you would have to declare more characters:

$regex = '!\b[Ss][Tt][A-Za-z]{5}\b!';

If you want to match Unicode Characters you can do this:

print "<meta charset=\"utf-8\"><body>";

    $regex = '!\bst([a-z]|[^u0000-u0080]){5}\b!iu';

    $words = "start startin starting station Stalker SHOWER Staples Stiffle Steerin StÄbles'";

    preg_match_all($regex,$words,$matches);

    print_r($matches[0]);

print "</body>";    

Output

    Array
(
    [0] => startin
    [1] => station
    [2] => Stalker
    [3] => Staples
    [4] => Stiffle
    [5] => Steerin
    [6] => StÄbles //without UTF-8 output it looks like this-> StÃ"bles
)
preg_match_all('/\bst\w{5}\b/', 'start startin starting', $arr, PREG_PATTERN_ORDER);

更新:根据评论使用前后字边界

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM