简体   繁体   中英

How to regex the beginning and the end of a sentence - python

I have a list of strings containing dates, country, and city:

myList = ["(1922, May, 22; USA; CHICAGO)","(1934, June, 15; USA; BOSTON)"]

I want to extract only the date and the city (cities are always with capital letters). So far I have this:

for info in myList:

        pattern_i = re.compile(r"[^;]+")
        pattern_f = re.compile(r";\s\b([A-Z]+)\)")

        mi = re.match(pattern_i, info)
        mf = re.match(pattern_f, info)

        print(mi)
        print(mf)

I am getting:

<re.Match object; span=(0, 14), match='(1922, May, 22'>
None
<re.Match object; span=(0, 15), match='(1934, June, 15'>
None

I've tried so many things and can't seem to find a solution. What am I missing here?

Regex is overkill for data with simple, consistent formatting. This can be done easily using the built in string manipulation functions.

for entry in myList:
    date, country, city = [x.strip() for x in entry[1:-1].split(';')]

# Explanation
entry[1:-1] # Strip off the parenthesis
entry[1:-1].split(';') # Split into a list of strings using the ';' character
x.strip() # Strip extra whitespace

regex for date: ^\(([^;]+)

regex for city ([AZ]+)\)$

You can use pandas :

p='\((?P<date>.*);.*;(?P<city>.*)\)'

pd.Series(myList).str.extract(p)

Output:

             date      city
0   1922, May, 22   CHICAGO
1  1934, June, 15    BOSTON
 thanks, But I am still curious? why am I getting None for mf?

Python offers two different primitive operations based on regular expressions: re.match() checks for a match only at the beginning of the string, while re.search() checks for a match anywhere in the string (this is what Perl does by default). Ref DOcs


re.match searches for match at the beginning of string, since the pattern you're trying to match isn't at the start of string so you're getting None you can use re.search is one option to find match value anywhere in the string


As i suggested split is a better option here, you should split by ; and take the first and last element to get the desired output

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM