繁体   English   中英

正则表达式匹配行尾或空格后跟通配符

[英]Regex to match end of line or whitespace followed by wildcard characters

我有一个字符串,我试图将一个城市和 state 与 Python 中的正则表达式进行匹配。一些字符串的最终国家/地区代码前面有一个空格。 我在编写匹配所有情况的正则表达式时遇到问题,并在第一个捕获组中捕获城市,在第二个捕获组中捕获 state

[^.*]?Born:.*in[^.](.*),[^.*](.*)

这是我目前拥有的正则表达式,这些是我尝试匹配的一些示例字符串。

  1. 出生:1961 年 11 月 8 日在伊利诺伊州芝加哥市
  2. 出生:1995年2月19日在塞尔维亚松博尔
  3. 出生:1976年5月19日出生于美国南卡罗来纳州格林维尔

根据我当前的正则表达式,这是我当前的 output:

  1. (芝加哥)(伊利诺伊州)
  2. (Sombor) (塞尔维亚 rs )
  3. (格林维尔)(美国南卡罗来纳州)

预期产出将是

  1. (芝加哥)(伊利诺伊州)
  2. (松博尔)(塞尔维亚)
  3. (格林维尔)(南卡罗来纳州)

我如何解释这个由空格和两个字符组成的尾随字符串? 任何帮助都会非常有用

利用

Born:.*in\s+([^,]*),\s+(.*?)(?=(?:\s[A-Za-z]{2})?$)

请参阅正则表达式证明

解释

Born: - matches the characters Born: literally (case sensitive)
.* - matches any character (except for line terminators), between zero and unlimited times, as many times as possible, giving back as needed (greedy)
in - matches the characters in literally (case sensitive)
\s+ - matches any whitespace character (equivalent to [\r\n\t\f\v  ]) between one and unlimited times, as many times as possible, giving back as needed (greedy)
1st Capturing Group ([^,]*)
  Match a single character not present in the list below [^,]* between zero and unlimited times, as many times as possible, giving back as needed (greedy)
  , - matches the character , with index 4410 (2C16 or 548) literally (case sensitive)
, -  matches the character , with index 4410 (2C16 or 548) literally (case sensitive)
\s+ - matches any whitespace character (equivalent to [\r\n\t\f\v  ]) between one and unlimited times, as many times as possible, giving back as needed (greedy)
2nd Capturing Group (.*?)
.*? - matches any character (except for line terminators) between zero and unlimited times, as few times as possible, expanding as needed (lazy)
Positive Lookahead (?=(?:\s[A-Za-z]{2})?$)
  Assert that the Regex below matches
  Non-capturing group (?:\s[A-Za-z]{2})?
  ? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
  \s matches any whitespace character (equivalent to [\r\n\t\f\v  ])
  Match a single character present in the list below [A-Za-z]
  {2} matches the previous token exactly 2 times
  A-Z matches a single character in the range between A (index 65) and Z (index 90) 
  (case sensitive)
  a-z matches a single character in the range between a (index 97) and z (index 122) 
  (case sensitive)
  $ asserts position at the end of a line

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM