I have a string line that looks like
A GOMUP 59/20 61/30 63/40 64/50 64/60 MUSVA DUTUM
I am trying to write a Regex that matches this string, and returns each of the none spacial text in an array. It has to ensure the first letter is 1 digit.
The Regex I have tried doesn't work how I would expect
#^([A-Z])(?:\s(\S+))+#
Returns
array(3) {
[0]=>
array(1) {
[0]=>
string(49) "A GOMUP 59/20 61/30 63/40 64/50 64/60 MUSVA DUTUM"
}
[1]=>
array(1) {
[0]=>
string(1) "A"
}
[2]=>
array(1) {
[0]=>
string(5) "DUTUM"
}
}
I expect/would like to return
array(10) {
[0]=>
array(1) {
[0]=>
string(49) "A GOMUP 59/20 61/30 63/40 64/50 64/60 MUSVA DUTUM"
}
[1]=>
array(1) {
[0]=>
string(1) "A"
}
[2]=>
array(1) {
[0]=>
string(5) "GOMUP"
}
[3]=>
array(1) {
[0]=>
string(5) "59/20"
}
[4]=>
array(1) {
[0]=>
string(5) "61/30"
}
[5]=>
array(1) {
[0]=>
string(5) "63/40"
}
[6]=>
array(1) {
[0]=>
string(5) "64/50"
}
[7]=>
array(1) {
[0]=>
string(5) "64/60"
}
[8]=>
array(1) {
[0]=>
string(5) "MUSVA"
}
[9]=>
array(1) {
[0]=>
string(5) "DUTUM"
}
}
How can this be achieved? I am using preg_match in PHP.
To split your string and check that the first item is a single letter at the same time, you can use this pattern:
$pattern = '~^[A-Z]\b|\G\s+\K\S+~';
$subject = 'A GOMUP 59/20 61/30 63/40 64/50 64/60 MUSVA DUTUM';
preg_match_all($pattern, $subject, $matches);
print_r($matches[0]);
You obtain:
Array
(
[0] => A
[1] => GOMUP
[2] => 59/20
[3] => 61/30
[4] => 63/40
[5] => 64/50
[6] => 64/60
[7] => MUSVA
[8] => DUTUM
)
If I test the string ZZ A GOMUP 59/20 61/30 63/40 64/50 64/60 MUSVA DUTUM
the pattern fails and no result is returned.
However you have the possibility to find the first substring which begins with a single letter using this pattern:
$pattern = '~^(?>\S{2,}\s+)*\K[A-Z]\b|\G\s+\K\S+~';
Pattern1 details: ~^[AZ]\\b|\\G\\s+\\K\\S+~
~ # pattern delimiter
^ # begining of the string anchor
[A-Z]\b # single uppercase letter with a word boundary
| # OR
\G # contiguous match from the last
\s+ # one or more white characters (spaces, tab, newlines...)
# which can be replaced by ' +' for your example string
\K # reset the match before (remove the spaces from the result)
\S+ # all that is not a space
~ # pattern delimiter
Pattern2 details: ~^(?>\\S{2,}\\s+)*\\K[AZ]\\b|\\G\\s+\\K\\S+~
~ # pattern delimiter
^ # begining of the string anchor
(?> # open a group (atomic here but you can use '(?:' instead)
\S{2,} # a non space character repeated at least two times
\s+ # one or more spaces
)* # repeat the group zero or more times
\K # reset the begining of the match
and after it is like Pattern1.
Regular expressions in PHP don't allow a variable number of matching groups, so you'll have to write a group for every part of the string. See eg http://www.regular-expressions.info/captureall.html
It would be easier to split the string by whitespace with explode or preg_split, and only then do the additional checks.
if (preg_match_all('#([A-Z]+)|([\d]+/[\d]+)#', $text, $matches)){
print_r($matches[0]);
}
Output:
Array
(
[0] => A
[1] => GOMUP
[2] => 59/20
[3] => 61/30
[4] => 63/40
[5] => 64/50
[6] => 64/60
[7] => MUSVA
[8] => DUTUM
)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.