I have a long string to split.
str1 = ' BATON ROUGE, LA -- Ascension, Assumption, East Baton Rouge, East Feliciana, Iberville, Livingston, Pointe Coupee, St. Helena, St. Mary, West Baton Rouge, West Feliciana Parishes, LA; Amite and Wilkinson Counties, MS. BEAUMONT-PORT ARTHUR, TX -- Hardin, Jasper, Jefferson, Newton, Orange,Tyler Counties, TX. '
expected outputs are:
sub1 = 'BATON ROUGE, LA -- Ascension, Assumption, East Baton Rouge, East Feliciana, Iberville, Livingston, Pointe Coupee, St. Helena, St. Mary, West Baton Rouge, West Feliciana Parishes, LA; Amite and Wilkinson Counties, MS.'
sub2 = 'BEAUMONT-PORT ARTHUR, TX -- Hardin, Jasper, Jefferson, Newton, Orange,Tyler Counties, TX.'
sub1 and sub2 contain region name and state name as well as associated county list.
If I split only by'.', there will be trouble that some county names also contain '.'. How could I split on pattern, each sub1 or sub2 should end with state aberration and '.', like here 'MS.' ,'TX.'? Thank you for your help.
You can try this:
import re
str1 = ' BATON ROUGE, LA -- Ascension, Assumption, East Baton Rouge, East Feliciana, Iberville, Livingston, Pointe Coupee, St. Helena, St. Mary, West Baton Rouge, West Feliciana Parishes, LA; Amite and Wilkinson Counties, MS. BEAUMONT-PORT ARTHUR, TX -- Hardin, Jasper, Jefferson, Newton, Orange,Tyler Counties, TX. '
new_data = re.split("(?<=\s[A-Z]{2})\.", str1)
print(new_data[0])
print(new_data[1])
Output:
BATON ROUGE, LA -- Ascension, Assumption, East Baton Rouge, East Feliciana, Iberville, Livingston, Pointe Coupee, St. Helena, St. Mary, West Baton Rouge, West Feliciana Parishes, LA; Amite and Wilkinson Counties, MS
BEAUMONT-PORT ARTHUR, TX -- Hardin, Jasper, Jefferson, Newton, Orange,Tyler Counties, TX
Regex explanation:
\\s[AZ]{2}
: looks for double capital letter abbreviation ie the state abbreviation proceeded by whitespace
(?<=\\s[AZ]{2}\\.
: positive look-behind, determines if .
is preceded by the pattern above.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.