Regex to match end of line or whitespace followed by wildcard characters

Question

I have a string where I'm trying to match a city and state with a regular expression in Python. Some of the strings have a final country code that is preceded by a space. I'm having trouble writing a regular expression that matches all the cases, and captures the city in the first capture group, and the state in the second capture g

[^.*]?Born:.*in[^.](.*),[^.*](.*)

This is the regular expression that I have so far, and these are some example strings that I'm trying to match.

Born: November 8, 1961 in Chicago, Illinois
Born: February 19, 1995 in Sombor, Serbia rs
Born: May 19, 1976 in Greenville, South Carolina us

Based on my current regular expression this is my current output:

(Chicago) (Illinois)
(Sombor) (Serbia rs )
(Greenville) (South Carolina us)

Expected outputs would be

(Chicago) (Illinois)
(Sombor) (Serbia)
(Greenville) (South Carolina)

How can I account for this trailing string of a space and two characters? Any help would be greatly spp

Answer 1

Use

Born:.*in\s+([^,]*),\s+(.*?)(?=(?:\s[A-Za-z]{2})?$)

See regex proof .

EXPLANATION

Born: - matches the characters Born: literally (case sensitive)
.* - matches any character (except for line terminators), between zero and unlimited times, as many times as possible, giving back as needed (greedy)
in - matches the characters in literally (case sensitive)
\s+ - matches any whitespace character (equivalent to [\r\n\t\f\v  ]) between one and unlimited times, as many times as possible, giving back as needed (greedy)
1st Capturing Group ([^,]*)
  Match a single character not present in the list below [^,]* between zero and unlimited times, as many times as possible, giving back as needed (greedy)
  , - matches the character , with index 4410 (2C16 or 548) literally (case sensitive)
, -  matches the character , with index 4410 (2C16 or 548) literally (case sensitive)
\s+ - matches any whitespace character (equivalent to [\r\n\t\f\v  ]) between one and unlimited times, as many times as possible, giving back as needed (greedy)
2nd Capturing Group (.*?)
.*? - matches any character (except for line terminators) between zero and unlimited times, as few times as possible, expanding as needed (lazy)
Positive Lookahead (?=(?:\s[A-Za-z]{2})?$)
  Assert that the Regex below matches
  Non-capturing group (?:\s[A-Za-z]{2})?
  ? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
  \s matches any whitespace character (equivalent to [\r\n\t\f\v  ])
  Match a single character present in the list below [A-Za-z]
  {2} matches the previous token exactly 2 times
  A-Z matches a single character in the range between A (index 65) and Z (index 90) 
  (case sensitive)
  a-z matches a single character in the range between a (index 97) and z (index 122) 
  (case sensitive)
  $ asserts position at the end of a line

Regex to match end of line or whitespace followed by wildcard characters

Question

1 answers

solution1
0 2022-04-15 20:58:52

Regex to match end of line or whitespace followed by wildcard characters

Question

1 answers

solution1 0 2022-04-15 20:58:52

solution1
0 2022-04-15 20:58:52