简体   繁体   English

匹配ruby中的正则表达式

[英]match regular expression in ruby

I have strings as below 我有如下字符串

201-Grandview-Dr_Early_TX_76802 and /50-Washington-St 201-Grandview-Dr_Early_TX_76802/50-Washington-St

I am writing a regex to match both the strings. 我正在写一个正则表达式来匹配两个字符串。

((/^([0-9]+)-([^_]+)-([A-Za-z]{1,})$/ =~ data ) == 0)

But the above regex matches only 50-Washington-St and not the second one. 但是上面的正则表达式仅匹配50-Washington-St ,而不匹配第二个。

So what could be wrong in this regex.? 那么,此正则表达式可能有什么问题呢?

The updated list of the strings that should match: 匹配的字符串的更新列表:

201-Grandview-Dr_Early_TX_76802
/50-Washington-St
49220-Sunrose-Ln_Palm-Desert_CA_92260
201-Grandview-Dr_Early_TX_76802
50-Washington-St

You may fix the regex like 您可以像这样修复正则表达式

/^\/?([0-9]+)-(.+?)-(\w+)$/

or to match the whole string (mind the ^ matches the line start and $ line end in Ruby regex): 或匹配整个字符串 (注意^匹配Ruby regex中的行首$行尾):

/\A\/?([0-9]+)-(.+?)-(\w+)\z/

See the Rubular demo 参见Rubular演示

Pattern details : 图案细节

  • \\A - string start \\A字符串开始
  • \\/? - an optional / -可选的/
  • ([0-9]+) - Group 1: one or more digits ([0-9]+) -组1:一位或多位数字
  • - - a hyphen -连字符
  • (.+?) - Group 2: one or more chars other than linebreak chars (.+?) -组2:一个或多个除换行符以外的其他字符
  • - - a hyphen -连字符
  • (\\w+) - Group 3: one or more word ( [A-Za-z0-9_] ) characters (\\w+) -第3组:一个或多个单词( [A-Za-z0-9_] )字符
  • \\z - end of string. \\z字符串结尾。

I would like to suggest a way of approaching problems like this one. 我想提出一种解决此类问题的方法。 The main take-away is that complex regular expressions can be constructed in the same way as other Ruby code: create small code modules that can be easily tested and then combine those modules. 主要的收获是,可以以与其他Ruby代码相同的方式构造复杂的正则表达式:创建可以轻松测试的小型代码模块,然后将这些模块组合在一起。

Consider the first string that must match the regex. 考虑必须与正则表达式匹配的第一个字符串。

s = "201-Grandview-Dr_Early_TX_76802"

As this string contains no characters that need to be escaped, we can create a regex that will exactly match this string by merely replacing the double-quotes with forward slashes and adding start-of-string ( \\A ) and end-of-string ( \\z ) anchors: 由于此字符串不包含需要转义的字符,因此我们可以创建一个正则表达式,使其与该字符串完全匹配,只需将双引号替换为正斜杠并添加字符串开头( \\A )和字符串结尾即可( \\z )锚点:

r = /\A201-Grandview-Dr_Early_TX_76802\z/
  #=> /\A201-Grandview-Dr_Early_TX_76802\z/ 
s =~ r
  #=> 0 

This is what we have: 这就是我们所拥有的:

/\A201-Grandview\-Dr_Early_TX_76802\z/
   ⬆︎street number
          ⬆︎street name
                  ⬆︎street name suffix
                      ⬆︎city
                           ⬆︎state
                                ⬆︎zip

Presumably the regex should match a string if and only if the string contains allowable values for each of these six fields and has the formatting shown between adjacent fields. 假定且仅当字符串包含这六个字段中每个字段的允许值并且在相邻字段之间显示格式时,正则表达式才应该与字符串匹配。

Let's begin by stipulating a separate regex for each of the six fields. 首先,为六个字段中的每个字段指定一个单独的正则表达式。 Naturally, all of these regexes may need to be modified to suit requirements. 自然,可能需要修改所有这些正则表达式以适应要求。

Street number 街道号码

Typical street numbers might be "221", "221B", "221b". 典型的街道号码可能是“ 221”,“ 221B”,“ 221b”。 Let's say we might also have "A19" or "221BZ" but not "221-B". 假设我们也可能有“ A19”或“ 221BZ”,但没有“ 221-B”。 We might then write: 然后我们可以写:

number = /[[:alnum:]]+/

(Search for "POSIX" in Regexp .) (在Regexp中搜索“ POSIX”。)

Street name 街道名称

I have assumed street names consist of a single word or multiple words separated by a single space, where each word is all lowercase except for the first letter, which is capitalized. 我假设街道名称由一个单词或多个单词组成,并用一个空格隔开,其中每个单词都是小写字母,但首字母大写。

street = /[[:upper:]][[:lower:]]+(?:\s[[:upper:]][[:lower:]]+)*/

/[[:upper:]][[:lower:]]+ matches the first word, (?:\\s[[:upper:]][[:lower:]])* matches a space followed by a capitalized word, repeated zero or more times ( (?:...) is a non-capture group. The trailing * means repeat zero or more times.) /[[:upper:]][[:lower:]]+匹配第一个单词, (?:\\s[[:upper:]][[:lower:]])*匹配一个空格,后跟一个大写单词,重复零次或多次( (?:...)是一个非捕获组。结尾的*表示重复零次或多次。)

Street name suffix 街道名称后缀

I have assumed the street name suffix (eg, 'Street', 'St.') is a single word, all lower case except the first character, which is upper case, optionally ending with a period: 我假设街道名称的后缀(例如,“ Street”,“ St。”)是一个单词,除了第一个字符(大写)以外的所有小写字母,可以选择以句点结尾:

suffix = /[[:upper:]][[:lower:]]+\.?/

City

I have assumed that names of cities has the same requirements as do names of streets: 我假设城市名称与街道名称具有相同的要求:

city = street
  #=> /[[:upper:]][[:lower:]]+(?:\s[[:upper:]][[:lower:]]+)*/

State

States are given by two capital letters: 州以两个大写字母表示:

state = /[[:upper:]]{2}/

We could be more precise by writing: 我们可以这样写:

state = Regexp.union %w| AL AK AZ ... |

but then we'd have to update it every time a territory became a new state or (possibly due to recent events) a state secedes from the union. 但是每当一个领土成为一个新州或(可能是由于最近的事件)一个州脱离联邦时,我们就必须对其进行更新。

Zip code 邮政编码

Zip codes are five digits or nine digits with a dash or hyphen after the first four digits. 邮政编码是五位数或九位数,在前四位数后带有破折号或连字符。

zip = /\d{5}(?:-\d{4})?/

Using 使用

/\A201-Grandview-Dr_Early_TX_76802\z/

as our pattern, our overall regex is therefore the following: 作为我们的模式,因此我们的整体正则表达式如下:

r1 = /
     \A # match start of string 
     #{number}
     -
     #{street}
     -
     #{suffix}
     _
     #{city}
     _
     #{state}
     _
     #{zip}
     \z # match end of string
     /x # free-spacing regex definition mode
  #=> /
  #   \A # match start of string 
  #   /(?-mix:[[:alnum:]]+)
  #   -
  #   (?-mix:[[:upper:]][[:lower:]]+(?:\s[[:upper:]][[:lower:]]+)*)
  #   -
  #   (?-mix:[[:upper:]][[:lower:]]+\.?)
  #   _
  #   (?-mix:[[:upper:]][[:lower:]]+(?:\s[[:upper:]][[:lower:]]+)*)
  #   _
  #   (?-mix:[[:upper:]]{2})
  #   _
  #   (?-mix:\d{5}(?:-\d{4})?)
  #   \z # match start of string 
  /x 

Let's try it for the first string and variations thereof: 让我们尝试第一个字符串及其变体:

"201-Grandview-Dr_Early_TX_76802" =~ r1
   #=> 0
"221B-Grand View-Dr._El Paso_TX_76802-0000" =~ r1
   #=> 0
"2A0B1-Grandview-Dr_Early_ZZ_76802" =~ r1
   #=> 0
"201-GrandView-Dr_Early_TX_76802" =~ r1
   #=> nil
"201-Grandview-Dr_Early_TX_7680" =~ r1
   #=> nil
"201-Pi11ar-St_Early_TX_76802" =~ r1
   #=> nil
"I live at 201-Grandview-Dr_Early_TX_76802" =~ r1
   #=> nil
"201-😎mg Circle-Lane_Early_TX_76802" =~ r1
   #=> nil

Now consider the second example string for which there should be a match: 现在考虑应该匹配的第二个示例字符串:

"/50-Washington-St"

We see the regex for this is simply 我们看到正则表达式很简单

r2 = /
     \A
     \/
     #{number}
     -
     #{street}
     -
     #{suffix}
     \z
     /x
 #=> /
 #   \A
 #   \/
 #   (?-mix:[[:alnum:]]+)
 #   -
 #   (?-mix:[[:upper:]][[:lower:]]+(?:\s[[:upper:]][[:lower:]]+)*)
 #   -
 #   (?-mix:[[:upper:]][[:lower:]]+\.?)
 #   \z
 #   /x 

Let's try it. 让我们尝试一下。

 "/50-Washington-St" =~ r2
   #=> 0
 "50-Washington-St" =~ r2
   #=> nil
 "/50-Washington-St_Early" =~ r2
   #=> nil

So now our overall regex is simply 所以现在我们的整体正则表达式很简单

r = Regexp.union(r1,r2)
  #=> /(?x-mi:
  #   \A # match start of string 
  #   (?-mix:[[:alnum:]]+)
  #   -
  #   (?-mix:[[:upper:]][[:lower:]]+(?:\s[[:upper:]][[:lower:]]+)*)
  #   -
  #   (?-mix:[[:upper:]][[:lower:]]+\.?)
  #   _
  #   (?-mix:[[:upper:]][[:lower:]]+(?:\s[[:upper:]][[:lower:]]+)*)
  #   _
  #   (?-mix:[[:upper:]]{2})
  #   _
  #   (?-mix:\d{5}(?:-\d{4})?)
  #   \z # match end of string
  #   )|(?x-mi:
  #   \A
  #   \/
  #   (?-mix:[[:alnum:]]+)
  #   -
  #   (?-mix:[[:upper:]][[:lower:]]+(?:\s[[:upper:]][[:lower:]]+)*)
  #   -
  #   (?-mix:[[:upper:]][[:lower:]]+\.?)
  #   \z
  #   )/ 

"201-Grandview-Dr_Early_TX_76802" =~ r
  #=> 0
"/50-Washington-St" =~ r
  #=> 0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM