简体   繁体   English

正则表达式以匹配许多坐标格式

[英]Regular Expression to match many coordinate formats

I am working on a regex that will match many different types of of location coordinates. 我正在研究一种正则表达式,它将匹配许多不同类型的位置坐标。 So far it matches about 90% of the formats: 到目前为止,它与大约90%的格式匹配:

([SNsn][\\s]*)?((?:[\\+-]?[0-9]*[\\.,][0-9]+)|(?:[\\+-]?[0-9]+))(?:(?:[^ms'′""″,\\.\\dNEWnew]?)|(?:[^ms'′""″,\\.\\dNEWnew]+((?:[\\+-]?[0-9]*[\\.,][0-9]+)|(?:[\\+-]?[0-9]+))(?:(?:[^ds°""″,\\.\\dNEWnew]?)|(?:[^ds°""″,\\.\\dNEWnew]+((?:[\\+-]?[0-9]*[\\.,][0-9]+)|(?:[\\+-]?[0-9]+))[^dm°'′,\\.\\dNEWnew]*))))([SNsn]?)[^\\dSNsnEWew]+([EWew][\\s]*)?((?:[\\+-]?[0-9]*[\\.,][0-9]+)|(?:[\\+-]?[0-9]+))(?:(?:[^ms'′""″,\\.\\dNEWnew]?)|(?:[^ms'′""″,\\.\\dNEWnew]+((?:[\\+-]?[0-9]*[\\.,][0-9]+)|(?:[\\+-]?[0-9]+))(?:(?:[^ds°""″,\\.\\dNEWnew]?)|(?:[^ds°""″,\\.\\dNEWnew]+((?:[\\+-]?[0-9]*[\\.,][0-9]+)|(?:[\\+-]?[0-9]+))[^dm°'′,\\.\\dNEWnew]*))))([EWew]?)

Testing the formats: 测试格式:

N 45° 55.732 W 122° 29.882 N 45°55.732 W 122°29.882

N 047° 38.938', W 122° 20.887' N 047°38.938',W 122°20.887'

40.123, -74.123 40.123,-74.123

40.123° N 74.123° W 40.123°北74.123°W

40° 7´ 22.8" N 74° 7´ 22.8" W 40°7´22.8“ N 74°7´22.8” W

40° 7.38' , -74° 7.38' 40°7.38',-74°7.38'

N40°7'22.8, W74°7'22.8" N40°7'22.8,W74°7'22.8“

40°7'22.8"N, 74°7'22.8"W 40°7'22.8“ N,74°7'22.8” W

40 7 22.8, -74 7 22.8 40 7 22.8,-74 7 22.8

40.123 -74.123 40.123 -74.123

40.123°,-74.123° 40.123°,-74.123°

144442800, -266842800 144442800,-266842800

40.123N74.123W 40.123N74.123W

4007.38N7407.38W 4007.38N7407.38W

40°7'22.8"N, 74°7'22.8"W 40°7'22.8“ N,74°7'22.8” W

400722.8N740722.8W 400722.8N740722.8W

N 40 7.38 W 74 7.38 N 40 7.38 W 74 7.38

40:7:23N,74:7:23W 40:7:23N,74:7:23W

40:7:22.8N 74:7:22.8W 40:7:22.8N 74:7:22.8W

40°7'23"N 74°7'23"W 40°7'23“北74°7'23”西

40°7'23" -74°7'23" 40°7'23“ -74°7'23”

40d 7' 23" N 74d 7' 23" W 40d 7'23“ N 74d 7'23” W

40.123N 74.123W 40.123N 74.123瓦

40° 7.38, -74° 7.38 40°7.38,-74°7.38

Testing if it works: https://regexr.com/3ivu2 测试是否有效: https//regexr.com/3ivu2

在此处输入图片说明

As you can see there are issues with the spaces and commas that are causing the regex to not match some of these formats. 如您所见,空格和逗号存在问题,导致正则表达式与其中某些格式不匹配。

I am trying to match the coordinate strings so that they can be highlighted in my iOS app and allow the user to tap them. 我正在尝试匹配坐标字符串,以便它们可以在我的iOS应用中突出显示,并允许用户点击它们。

What can I do to update the regex and fix the matching issues? 我该怎么做来更新正则表达式并解决匹配问题?

Overview 概观

I'm sure there are many ways to go about this. 我敢肯定有很多方法可以解决这个问题。 Since you haven't specified a regex engine or programming language, I'll post one that works in PCRE and what that should work in most engines. 由于您尚未指定正则表达式引擎或编程语言,因此我将发布一种适用于PCRE的引擎以及在大多数引擎中应该使用的引擎。 The PCRE regex is much easier to understand than the non-PCRE regex, but both use the exact same logic. 与非PCRE regex相比,PCRE regex更容易理解,但是两者都使用完全相同的逻辑。

The patterns defined below match each string you've presented in your question and properly separates each part of the coordinate (x, y). 下面定义的模式与问题中显示的每个字符串匹配,并正确分隔坐标的每个部分(x,y)。


Code

PCRE PCRE

This method uses the DEFINE construct to pre-define patterns. 此方法使用DEFINE构造来预定义模式。 The beauty of this construct is that you can define reusable parts of your regex in one location, thus, you can edit most of the regex just by editing these subpatterns. 这种构造的优点在于,您可以在一个位置定义正则表达式的可重用部分,因此,只需编辑这些子模式即可编辑大多数正则表达式。

See regex in use here 查看正则表达式在这里使用

(?(DEFINE)
  (?<ns>[ns])
  (?<ew>[ew])
  (?<d>[°´’'"d:])
  (?<n>[+-]?\d+(?:\.\d+)?)
)
(
  (?&ns)?
  (?:\ ?(?&n)(?&d)?){1,3}
  \ ?(?&ns)?
)
\ ?,?\ ?
(
  (?&ew)?
  (?:\ ?(?&n)(?&d)?){1,3}
  \ ?(?&ew)?
)

Flags: gix 标志: gix

Non-PCRE 非PCRE

See regex in use here 查看正则表达式在这里使用

(
  [ns]?
  (?:\ ?[+-]?\d+(?:\.\d+)?[°´’'"d:]?){1,3}
  \ ?[ns]?
)
\ ?,?\ ?
(
  [ew]?
  (?:\ ?[+-]?\d+(?:\.\d+)?[°´’'"d:]?){1,3}
  \ ?[ew]?
)

Flags: gix . 标志: gix

Some engines don't have the x flag. 某些引擎没有x标志。 For those engines you can use the following one-liner ( as seen here ): 对于这些引擎,您可以使用以下单缸( 如此处所示 ):

([ns]?(?: ?[+-]?\d+(?:\.\d+)?[°´’'"d:]?){1,3} ?[ns]?) ?,? ?([ew]?(?: ?[+-]?\d+(?:\.\d+)?[°´’'"d:]?){1,3} ?[ew]?)

Explanation 说明

Since both patterns are essentially the same (non-PCRE is just an expanded version of the PCRE), I'll define the PCRE regex pattern since it's easier to grasp. 由于两种模式本质上是相同的(非PCRE只是PCRE的扩展版本),我将定义PCRE regex模式,因为它更容易掌握。

Note that the patterns that use x have escaped spaces since they would otherwise be ignored ( x ignores whitespace within the pattern). 请注意,使用x的模式已转义了空格,因为否则它们将被忽略( x忽略模式中的空白)。 The i flag allows us to match text regardless of case ( i makes our pattern case-insensitive). i标志使我们能够匹配大小写的文本( i使我们的模式不区分大小写)。

DEFINE 限定

  • (?(DEFINE)...) The DEFINE group is completely ignored by regex. (?(DEFINE)...) DEFINE组被正则表达式完全忽略。 It gets treated as a var name=value , whereas you can recall the specific pattern for use via its name. 它被视为var name=value ,而您可以通过其名称来调用要使用的特定模式。
  • (?<ns>[ns]) The group ns matches any character in the set nsNS (?<ns>[ns])ns匹配集合nsNS中的任何字符
  • (?<ew>[ew]) The group ew matches any character in the set ewEW (?<ew>[ew])ew匹配集合ewEW中的任何字符
  • (?<d>[°´''"d:]) The group d matches any character in the set °´''"d: (?<d>[°´''"d:])d匹配集合°´''"d:中的任何字符°´''"d:
  • (?<n>[+-]?\\d+(?:\\.\\d+)?) The group n matches any number that matches the following structure (?<n>[+-]?\\d+(?:\\.\\d+)?)n匹配与以下结构匹配的任何数字
    • [+-]? Optionally match any character in the set +- (可选)匹配集合中的任何字符+-
    • \\d+ Match one or more digits \\d+匹配一个或多个数字
    • (?:\\.\\d+)? Optionally match a decimal point followed by one or more digits (可选)匹配小数点后跟一位或多位数字

Pattern 图案

The pattern is composed of 3 larger parts. 图案由3个较大的部分组成。 The first and last are capture groups (the coordinates themselves) and the second is what separates the two. 第一个和最后一个是捕获组(坐标本身),第二个是将两者分开的对象。

  • Capture 1: 捕获1:
    • (?&ns)? Optionally match the group ns (可选)匹配组ns
    • (?:\\ ?(?&n)(?&d)?){1,3} Matches [an optional space, followed by the group n then optionally group d ] between one and three times (?:\\ ?(?&n)(?&d)?){1,3}匹配1至3次[可选空格,后跟n组,然后可选地是d组]
    • \\ ?(?&ns)? Optionally match a space, optionally match the group ns 可选地匹配一个空格,可选地匹配组ns
  • \\ ?,?\\ ? Match an optional space, comma and space (this separates each coordinate part) 匹配可选的空格,逗号和空格(这将分隔每个坐标部分)
  • Capture 2: This is the same as Capture 1 but replaces the group ns with the group ew 捕获2:与捕获1相同,但用ew组替换ns

This simplified regex literally matches all the patterns you've given: 这个简化的正则表达式实际上符合您提供的所有模式:

^((?:[NW]? ?(?:[-\d.d]+[NW:°´’'",]?[ NW]?)+[, ]*)+[NW]?)$

I'm not an expert for coordinates, but you can modify it easily if I didn't take into account some specifics. 我不是坐标专家,但是如果我不考虑某些细节,可以轻松修改它。

A full test is here . 完整的测试在这里

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM