简体   繁体   中英

How to use regex capture groups in pandas replace function

I've got a pandas DataFrame that uses "2Nd" instead of "2nd", "136Th" instead of "136th", etc. I want the letter immediately following the number to be lowercase.

Sample data:

data = pd.Series(['21St StNew York', 'Exampe BlvdSt Louis', '1St Rd'])

Desired output:

['21st StNew York', 'Exampe BlvdSt Louis', '1st Rd']

Tried using str.replace() :

data = data.str.replace('\BSt', 'st', regex=True)
['21st StNew York', 'Exampe Blvdst Louis', '1st Rd']

Is it possible to use a capture group?

data = data.str.replace('[0-9]+(St)', 'st', regex=True)
['st StNew York', 'Exampe BlvdSt Louis', 'st Rd']

Use a callable for repl

new_data = data.str.replace('(\d+[A-Z])', lambda m: m.group(1).lower())

Out[49]:
0        21st StNew York
1    Exampe BlvdSt Louis
2                 1st Rd
dtype: object

我们可以尝试对模式(?<=\\d)[AZ]进行正则表达式替换,然后用小写版本替换:

df['dat'] = df['data'].str.replace(r'(?<=\d)[A-Z]', lambda x: x.group(0).lower())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM