I need a regex which matches a 7
letter word, which starts with 'st'
. For example, it should only match 'startin'
out of the following: start startin starting
General tips:
The starting symbols are included into the regex directly, eg st
. If the starting characters are special in the sense of regex-syntax (like dots, parentheses, etc.), you need to escape them with a backslash, but it is not needed in your case.
After the starting symbols, include character class for the remaining characters of your "word". If you want to allow all characters, use a dot: .
. If you want to allow all non-whitespace characters, use \\S
. If you want to allow only (unicode) letters, use \\p{L}
. To only allow non-accented latin letters, use [A-Za-z]
. There are many possibilities here.
Finally, include repetition quantifier for the character class from the previous step. In you case, you need exactly 5 characters after st
, so the repetition quantifier is {5}
.
If you want only the whole string to match, use \\A
at the beginning and \\z
at the end of your regex. Or include \\b
at the beginning/end of your regex to match at the so-called word boundaries (including start/end of the string, whitespace, punctuation). The most powerful alternative (with full control) is the so-called lookahead - I'll leave it out here for the sake of simplicity.
See this tutorial for details. You can just look for specific keywords I've mentioned, eg repetition , character class , unicode , lookahead , etc.
To match words with non-accent characters that are case insensitive you'll need the i
modifier or you'll need to declare both letters at the beginning in both cases.
<?php
$regex = '!\bst[a-z]{5}\b!i';
$words = "start startin starting station Stalker SHOWER Staples Stiffle Steerin StÄbles'";
preg_match_all($regex,$words,$matches);
print_r($matches[0]);
?>
Output
Array
(
[0] => startin
[1] => station
[2] => Stalker
[3] => Staples
[4] => Stiffle
[5] => Steerin
)
With the same output as above, if you didn't use the i
modifier you would have to declare more characters:
$regex = '!\b[Ss][Tt][A-Za-z]{5}\b!';
If you want to match Unicode Characters you can do this:
print "<meta charset=\"utf-8\"><body>";
$regex = '!\bst([a-z]|[^u0000-u0080]){5}\b!iu';
$words = "start startin starting station Stalker SHOWER Staples Stiffle Steerin StÄbles'";
preg_match_all($regex,$words,$matches);
print_r($matches[0]);
print "</body>";
Output
Array
(
[0] => startin
[1] => station
[2] => Stalker
[3] => Staples
[4] => Stiffle
[5] => Steerin
[6] => StÄbles //without UTF-8 output it looks like this-> StÃ"bles
)
preg_match_all('/\bst\w{5}\b/', 'start startin starting', $arr, PREG_PATTERN_ORDER);
更新:根据评论使用前后字边界
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.