I'm currently trying to find phone number extensions from pandas series, an example being 'Ext: 123'. The extension can be in the cell either on its own (like previously) or after a phone number, eg 123 456 789 / Ext: 4502.
The extensions can also be in varying formats, such as Ex.430 (missing the letter t, no space after punctuation mark. Therefore, I wanted to find all sequences in the series that have 1-3 letters, followed by zero or more symbols, zero or more spaces, followed by 2 to 6 numbers.
Optimally, I would also replace these with the correct format, which is Ext: 32 (can be up to 6 numbers)
Here is my regex so far:
({'\D{1,3}\W*\s*\d{2,6}]'
I have also used other variations, but those didn't work either.
I would appreciate any help, thanks.
You can just split the column on alphabet characters (plus colon).
df['phones'].str.split(r'[A-Za-z:]+\.?', expand=True)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.