正则表达式匹配行尾或空格后跟通配符

Question

我有一个字符串，我试图将一个城市和 state 与 Python 中的正则表达式进行匹配。一些字符串的最终国家/地区代码前面有一个空格。 我在编写匹配所有情况的正则表达式时遇到问题，并在第一个捕获组中捕获城市，在第二个捕获组中捕获 state

[^.*]?Born:.*in[^.](.*),[^.*](.*)

这是我目前拥有的正则表达式，这些是我尝试匹配的一些示例字符串。

出生：1961 年 11 月 8 日在伊利诺伊州芝加哥市
出生：1995年2月19日在塞尔维亚松博尔
出生：1976年5月19日出生于美国南卡罗来纳州格林维尔

根据我当前的正则表达式，这是我当前的 output：

（芝加哥）（伊利诺伊州）
(Sombor) (塞尔维亚 rs )
（格林维尔）（美国南卡罗来纳州）

预期产出将是

（芝加哥）（伊利诺伊州）
（松博尔）（塞尔维亚）
（格林维尔）（南卡罗来纳州）

我如何解释这个由空格和两个字符组成的尾随字符串？ 任何帮助都会非常有用

Answer 1

利用

Born:.*in\s+([^,]*),\s+(.*?)(?=(?:\s[A-Za-z]{2})?$)

请参阅正则表达式证明。

解释

Born: - matches the characters Born: literally (case sensitive)
.* - matches any character (except for line terminators), between zero and unlimited times, as many times as possible, giving back as needed (greedy)
in - matches the characters in literally (case sensitive)
\s+ - matches any whitespace character (equivalent to [\r\n\t\f\v  ]) between one and unlimited times, as many times as possible, giving back as needed (greedy)
1st Capturing Group ([^,]*)
  Match a single character not present in the list below [^,]* between zero and unlimited times, as many times as possible, giving back as needed (greedy)
  , - matches the character , with index 4410 (2C16 or 548) literally (case sensitive)
, -  matches the character , with index 4410 (2C16 or 548) literally (case sensitive)
\s+ - matches any whitespace character (equivalent to [\r\n\t\f\v  ]) between one and unlimited times, as many times as possible, giving back as needed (greedy)
2nd Capturing Group (.*?)
.*? - matches any character (except for line terminators) between zero and unlimited times, as few times as possible, expanding as needed (lazy)
Positive Lookahead (?=(?:\s[A-Za-z]{2})?$)
  Assert that the Regex below matches
  Non-capturing group (?:\s[A-Za-z]{2})?
  ? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
  \s matches any whitespace character (equivalent to [\r\n\t\f\v  ])
  Match a single character present in the list below [A-Za-z]
  {2} matches the previous token exactly 2 times
  A-Z matches a single character in the range between A (index 65) and Z (index 90) 
  (case sensitive)
  a-z matches a single character in the range between a (index 97) and z (index 122) 
  (case sensitive)
  $ asserts position at the end of a line

正则表达式匹配行尾或空格后跟通配符

问题描述

1 个解决方案

解决方案1
0 2022-04-15 20:58:52

正则表达式匹配行尾或空格后跟通配符

问题描述

1 个解决方案

解决方案1 0 2022-04-15 20:58:52

解决方案1
0 2022-04-15 20:58:52