简体   繁体   中英

Using regex to find (and replace) phone number extensions (Python)

I'm currently trying to find phone number extensions from pandas series, an example being 'Ext: 123'. The extension can be in the cell either on its own (like previously) or after a phone number, eg 123 456 789 / Ext: 4502.

The extensions can also be in varying formats, such as Ex.430 (missing the letter t, no space after punctuation mark. Therefore, I wanted to find all sequences in the series that have 1-3 letters, followed by zero or more symbols, zero or more spaces, followed by 2 to 6 numbers.

Optimally, I would also replace these with the correct format, which is Ext: 32 (can be up to 6 numbers)

Here is my regex so far:

({'\D{1,3}\W*\s*\d{2,6}]'

I have also used other variations, but those didn't work either.

I would appreciate any help, thanks.

You can just split the column on alphabet characters (plus colon).

df['phones'].str.split(r'[A-Za-z:]+\.?', expand=True)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM