简体   繁体   中英

Regular Expression to match many coordinate formats

I am working on a regex that will match many different types of of location coordinates. So far it matches about 90% of the formats:

([SNsn][\\s]*)?((?:[\\+-]?[0-9]*[\\.,][0-9]+)|(?:[\\+-]?[0-9]+))(?:(?:[^ms'′""″,\\.\\dNEWnew]?)|(?:[^ms'′""″,\\.\\dNEWnew]+((?:[\\+-]?[0-9]*[\\.,][0-9]+)|(?:[\\+-]?[0-9]+))(?:(?:[^ds°""″,\\.\\dNEWnew]?)|(?:[^ds°""″,\\.\\dNEWnew]+((?:[\\+-]?[0-9]*[\\.,][0-9]+)|(?:[\\+-]?[0-9]+))[^dm°'′,\\.\\dNEWnew]*))))([SNsn]?)[^\\dSNsnEWew]+([EWew][\\s]*)?((?:[\\+-]?[0-9]*[\\.,][0-9]+)|(?:[\\+-]?[0-9]+))(?:(?:[^ms'′""″,\\.\\dNEWnew]?)|(?:[^ms'′""″,\\.\\dNEWnew]+((?:[\\+-]?[0-9]*[\\.,][0-9]+)|(?:[\\+-]?[0-9]+))(?:(?:[^ds°""″,\\.\\dNEWnew]?)|(?:[^ds°""″,\\.\\dNEWnew]+((?:[\\+-]?[0-9]*[\\.,][0-9]+)|(?:[\\+-]?[0-9]+))[^dm°'′,\\.\\dNEWnew]*))))([EWew]?)

Testing the formats:

N 45° 55.732 W 122° 29.882

N 047° 38.938', W 122° 20.887'

40.123, -74.123

40.123° N 74.123° W

40° 7´ 22.8" N 74° 7´ 22.8" W

40° 7.38' , -74° 7.38'

N40°7'22.8, W74°7'22.8"

40°7'22.8"N, 74°7'22.8"W

40 7 22.8, -74 7 22.8

40.123 -74.123

40.123°,-74.123°

144442800, -266842800

40.123N74.123W

4007.38N7407.38W

40°7'22.8"N, 74°7'22.8"W

400722.8N740722.8W

N 40 7.38 W 74 7.38

40:7:23N,74:7:23W

40:7:22.8N 74:7:22.8W

40°7'23"N 74°7'23"W

40°7'23" -74°7'23"

40d 7' 23" N 74d 7' 23" W

40.123N 74.123W

40° 7.38, -74° 7.38

Testing if it works: https://regexr.com/3ivu2

在此处输入图片说明

As you can see there are issues with the spaces and commas that are causing the regex to not match some of these formats.

I am trying to match the coordinate strings so that they can be highlighted in my iOS app and allow the user to tap them.

What can I do to update the regex and fix the matching issues?

Overview

I'm sure there are many ways to go about this. Since you haven't specified a regex engine or programming language, I'll post one that works in PCRE and what that should work in most engines. The PCRE regex is much easier to understand than the non-PCRE regex, but both use the exact same logic.

The patterns defined below match each string you've presented in your question and properly separates each part of the coordinate (x, y).


Code

PCRE

This method uses the DEFINE construct to pre-define patterns. The beauty of this construct is that you can define reusable parts of your regex in one location, thus, you can edit most of the regex just by editing these subpatterns.

See regex in use here

(?(DEFINE)
  (?<ns>[ns])
  (?<ew>[ew])
  (?<d>[°´’'"d:])
  (?<n>[+-]?\d+(?:\.\d+)?)
)
(
  (?&ns)?
  (?:\ ?(?&n)(?&d)?){1,3}
  \ ?(?&ns)?
)
\ ?,?\ ?
(
  (?&ew)?
  (?:\ ?(?&n)(?&d)?){1,3}
  \ ?(?&ew)?
)

Flags: gix

Non-PCRE

See regex in use here

(
  [ns]?
  (?:\ ?[+-]?\d+(?:\.\d+)?[°´’'"d:]?){1,3}
  \ ?[ns]?
)
\ ?,?\ ?
(
  [ew]?
  (?:\ ?[+-]?\d+(?:\.\d+)?[°´’'"d:]?){1,3}
  \ ?[ew]?
)

Flags: gix .

Some engines don't have the x flag. For those engines you can use the following one-liner ( as seen here ):

([ns]?(?: ?[+-]?\d+(?:\.\d+)?[°´’'"d:]?){1,3} ?[ns]?) ?,? ?([ew]?(?: ?[+-]?\d+(?:\.\d+)?[°´’'"d:]?){1,3} ?[ew]?)

Explanation

Since both patterns are essentially the same (non-PCRE is just an expanded version of the PCRE), I'll define the PCRE regex pattern since it's easier to grasp.

Note that the patterns that use x have escaped spaces since they would otherwise be ignored ( x ignores whitespace within the pattern). The i flag allows us to match text regardless of case ( i makes our pattern case-insensitive).

DEFINE

  • (?(DEFINE)...) The DEFINE group is completely ignored by regex. It gets treated as a var name=value , whereas you can recall the specific pattern for use via its name.
  • (?<ns>[ns]) The group ns matches any character in the set nsNS
  • (?<ew>[ew]) The group ew matches any character in the set ewEW
  • (?<d>[°´''"d:]) The group d matches any character in the set °´''"d:
  • (?<n>[+-]?\\d+(?:\\.\\d+)?) The group n matches any number that matches the following structure
    • [+-]? Optionally match any character in the set +-
    • \\d+ Match one or more digits
    • (?:\\.\\d+)? Optionally match a decimal point followed by one or more digits

Pattern

The pattern is composed of 3 larger parts. The first and last are capture groups (the coordinates themselves) and the second is what separates the two.

  • Capture 1:
    • (?&ns)? Optionally match the group ns
    • (?:\\ ?(?&n)(?&d)?){1,3} Matches [an optional space, followed by the group n then optionally group d ] between one and three times
    • \\ ?(?&ns)? Optionally match a space, optionally match the group ns
  • \\ ?,?\\ ? Match an optional space, comma and space (this separates each coordinate part)
  • Capture 2: This is the same as Capture 1 but replaces the group ns with the group ew

This simplified regex literally matches all the patterns you've given:

^((?:[NW]? ?(?:[-\d.d]+[NW:°´’'",]?[ NW]?)+[, ]*)+[NW]?)$

I'm not an expert for coordinates, but you can modify it easily if I didn't take into account some specifics.

A full test is here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM