I am working on a regex that will match many different types of of location coordinates. So far it matches about 90% of the formats:
([SNsn][\\s]*)?((?:[\\+-]?[0-9]*[\\.,][0-9]+)|(?:[\\+-]?[0-9]+))(?:(?:[^ms'′""″,\\.\\dNEWnew]?)|(?:[^ms'′""″,\\.\\dNEWnew]+((?:[\\+-]?[0-9]*[\\.,][0-9]+)|(?:[\\+-]?[0-9]+))(?:(?:[^ds°""″,\\.\\dNEWnew]?)|(?:[^ds°""″,\\.\\dNEWnew]+((?:[\\+-]?[0-9]*[\\.,][0-9]+)|(?:[\\+-]?[0-9]+))[^dm°'′,\\.\\dNEWnew]*))))([SNsn]?)[^\\dSNsnEWew]+([EWew][\\s]*)?((?:[\\+-]?[0-9]*[\\.,][0-9]+)|(?:[\\+-]?[0-9]+))(?:(?:[^ms'′""″,\\.\\dNEWnew]?)|(?:[^ms'′""″,\\.\\dNEWnew]+((?:[\\+-]?[0-9]*[\\.,][0-9]+)|(?:[\\+-]?[0-9]+))(?:(?:[^ds°""″,\\.\\dNEWnew]?)|(?:[^ds°""″,\\.\\dNEWnew]+((?:[\\+-]?[0-9]*[\\.,][0-9]+)|(?:[\\+-]?[0-9]+))[^dm°'′,\\.\\dNEWnew]*))))([EWew]?)
Testing the formats:
N 45° 55.732 W 122° 29.882
N 047° 38.938', W 122° 20.887'
40.123, -74.123
40.123° N 74.123° W
40° 7´ 22.8" N 74° 7´ 22.8" W
40° 7.38' , -74° 7.38'
N40°7'22.8, W74°7'22.8"
40°7'22.8"N, 74°7'22.8"W
40 7 22.8, -74 7 22.8
40.123 -74.123
40.123°,-74.123°
144442800, -266842800
40.123N74.123W
4007.38N7407.38W
40°7'22.8"N, 74°7'22.8"W
400722.8N740722.8W
N 40 7.38 W 74 7.38
40:7:23N,74:7:23W
40:7:22.8N 74:7:22.8W
40°7'23"N 74°7'23"W
40°7'23" -74°7'23"
40d 7' 23" N 74d 7' 23" W
40.123N 74.123W
40° 7.38, -74° 7.38
Testing if it works: https://regexr.com/3ivu2
As you can see there are issues with the spaces and commas that are causing the regex to not match some of these formats.
I am trying to match the coordinate strings so that they can be highlighted in my iOS
app and allow the user to tap them.
What can I do to update the regex and fix the matching issues?
I'm sure there are many ways to go about this. Since you haven't specified a regex engine or programming language, I'll post one that works in PCRE and what that should work in most engines. The PCRE regex is much easier to understand than the non-PCRE regex, but both use the exact same logic.
The patterns defined below match each string you've presented in your question and properly separates each part of the coordinate (x, y).
This method uses the DEFINE
construct to pre-define patterns. The beauty of this construct is that you can define reusable parts of your regex in one location, thus, you can edit most of the regex just by editing these subpatterns.
(?(DEFINE)
(?<ns>[ns])
(?<ew>[ew])
(?<d>[°´’'"d:])
(?<n>[+-]?\d+(?:\.\d+)?)
)
(
(?&ns)?
(?:\ ?(?&n)(?&d)?){1,3}
\ ?(?&ns)?
)
\ ?,?\ ?
(
(?&ew)?
(?:\ ?(?&n)(?&d)?){1,3}
\ ?(?&ew)?
)
Flags: gix
(
[ns]?
(?:\ ?[+-]?\d+(?:\.\d+)?[°´’'"d:]?){1,3}
\ ?[ns]?
)
\ ?,?\ ?
(
[ew]?
(?:\ ?[+-]?\d+(?:\.\d+)?[°´’'"d:]?){1,3}
\ ?[ew]?
)
Flags: gix
.
Some engines don't have the x
flag. For those engines you can use the following one-liner ( as seen here ):
([ns]?(?: ?[+-]?\d+(?:\.\d+)?[°´’'"d:]?){1,3} ?[ns]?) ?,? ?([ew]?(?: ?[+-]?\d+(?:\.\d+)?[°´’'"d:]?){1,3} ?[ew]?)
Since both patterns are essentially the same (non-PCRE is just an expanded version of the PCRE), I'll define the PCRE regex pattern since it's easier to grasp.
Note that the patterns that use x
have escaped spaces since they would otherwise be ignored ( x
ignores whitespace within the pattern). The i
flag allows us to match text regardless of case ( i
makes our pattern case-insensitive).
(?(DEFINE)...)
The DEFINE
group is completely ignored by regex. It gets treated as a var name=value
, whereas you can recall the specific pattern for use via its name. (?<ns>[ns])
The group ns
matches any character in the set nsNS
(?<ew>[ew])
The group ew
matches any character in the set ewEW
(?<d>[°´''"d:])
The group d
matches any character in the set °´''"d:
(?<n>[+-]?\\d+(?:\\.\\d+)?)
The group n
matches any number that matches the following structure
[+-]?
Optionally match any character in the set +-
\\d+
Match one or more digits (?:\\.\\d+)?
Optionally match a decimal point followed by one or more digits The pattern is composed of 3 larger parts. The first and last are capture groups (the coordinates themselves) and the second is what separates the two.
(?&ns)?
Optionally match the group ns
(?:\\ ?(?&n)(?&d)?){1,3}
Matches [an optional space, followed by the group n
then optionally group d
] between one and three times \\ ?(?&ns)?
Optionally match a space, optionally match the group ns
\\ ?,?\\ ?
Match an optional space, comma and space (this separates each coordinate part) ns
with the group ew
This simplified regex literally matches all the patterns you've given:
^((?:[NW]? ?(?:[-\d.d]+[NW:°´’'",]?[ NW]?)+[, ]*)+[NW]?)$
I'm not an expert for coordinates, but you can modify it easily if I didn't take into account some specifics.
A full test is here .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.