简体   繁体   English

Python Pandas正则表达式替换匹配模式的一部分

[英]Python pandas regular expression replace part of the matching pattern

I've got a bunch of addresses like so: 我有很多这样的地址:

df['street'] =
    5311 Whitsett Ave 34
    355 Sawyer St
    607 Hampshire Rd #358
    342 Old Hwy 1
    267 W Juniper Dr 402

What I want to do is to remove those numbers at the end of the street part of the addresses to get: 我想要做的是删除地址街道部分末尾的那些数字,以获得:

df['street'] =
    5311 Whitsett Ave
    355 Sawyer St
    607 Hampshire Rd
    342 Old Hwy 1
    267 W Juniper Dr

I have my regular expression like this: 我有这样的正则表达式:

df['street'] = df.street.str.replace(r"""\s(?:dr|ave|rd)[^a-zA-Z]\D*\d+$""", '', case=False)

which gives me this: 这给了我这个:

df['street'] =
    5311 Whitsett
    355 Sawyer St
    607 Hampshire
    342 Old Hwy 1
    267 W Juniper

It dropped the words 'Ave', 'Rd' and 'Dr' from my original street addresses. 它从我原来的街道地址中删除了“ Ave”,“ Rd”和“ Dr”一词。 Is there a way to keep part of the regular expression pattern (in my case this is 'Ave', 'Rd', 'Dr' and replace the rest? 有没有办法保留部分正则表达式模式(在我的情况下是“ Ave”,“ Rd”,“ Dr”并替换其余部分?

EDIT: Notice the address 342 Old Hwy 1 . 编辑:注意地址342 Old Hwy 1 I do not want to also take out the number in such cases. 在这种情况下,我也不想取出这个数字。 That's why I specified the patterns ('Ave', 'Rd', 'Dr', etc) to have a better control of who gets changed. 这就是为什么我指定模式(“ Ave”,“ Rd”,“ Dr”等)以更好地控制谁被更改的原因。

    df_street = '''
        5311 Whitsett Ave 34
        355 Sawyer St
        607 Hampshire Rd #358
        342 Old Hwy 1
        267 W Juniper Dr 402
        '''
    # digits on the end are preceded by one of ( Ave, Rd, Dr), space,
    # may be preceded by a #, and followed by a possible space, and by the newline
   df_street = re.sub(r'(Ave|Rd|Dr)\s+#?\d+\s*\n',r'\1\n', df_street,re.MULTILINE|re.IGNORECASE)
print(df_street)

    5311 Whitsett Ave
    355 Sawyer St
    607 Hampshire Rd
    342 Old Hwy 1
    267 W Juniper Dr

You should use the following regex: 您应该使用以下正则表达式:

>>> import re
>>> example_str = "607 Hampshire Rd #358"
>>> re.sub(r"\s*\#?[^\D]+\s*$", r"", example_str)
'607 Hampshire Rd'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM