I have a string where I'm trying to match a city and state with a regular expression in Python. Some of the strings have a final country code that is preceded by a space. I'm having trouble writing a regular expression that matches all the cases, and captures the city in the first capture group, and the state in the second capture g
[^.*]?Born:.*in[^.](.*),[^.*](.*)
This is the regular expression that I have so far, and these are some example strings that I'm trying to match.
Based on my current regular expression this is my current output:
Expected outputs would be
How can I account for this trailing string of a space and two characters? Any help would be greatly spp
Use
Born:.*in\s+([^,]*),\s+(.*?)(?=(?:\s[A-Za-z]{2})?$)
See regex proof .
EXPLANATION
Born: - matches the characters Born: literally (case sensitive)
.* - matches any character (except for line terminators), between zero and unlimited times, as many times as possible, giving back as needed (greedy)
in - matches the characters in literally (case sensitive)
\s+ - matches any whitespace character (equivalent to [\r\n\t\f\v ]) between one and unlimited times, as many times as possible, giving back as needed (greedy)
1st Capturing Group ([^,]*)
Match a single character not present in the list below [^,]* between zero and unlimited times, as many times as possible, giving back as needed (greedy)
, - matches the character , with index 4410 (2C16 or 548) literally (case sensitive)
, - matches the character , with index 4410 (2C16 or 548) literally (case sensitive)
\s+ - matches any whitespace character (equivalent to [\r\n\t\f\v ]) between one and unlimited times, as many times as possible, giving back as needed (greedy)
2nd Capturing Group (.*?)
.*? - matches any character (except for line terminators) between zero and unlimited times, as few times as possible, expanding as needed (lazy)
Positive Lookahead (?=(?:\s[A-Za-z]{2})?$)
Assert that the Regex below matches
Non-capturing group (?:\s[A-Za-z]{2})?
? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
\s matches any whitespace character (equivalent to [\r\n\t\f\v ])
Match a single character present in the list below [A-Za-z]
{2} matches the previous token exactly 2 times
A-Z matches a single character in the range between A (index 65) and Z (index 90)
(case sensitive)
a-z matches a single character in the range between a (index 97) and z (index 122)
(case sensitive)
$ asserts position at the end of a line
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.