简体   繁体   中英

Remove first character from pandas column if the number 1

The below code removes any dashes in any of the phone number columns. How do I also remove the first character of a phone number in those columns if the phone number begins with 1. I basically want to have all ten digit numbers with no leading 1s.

import pandas as pd
import numpy as np
import re

df = pd.read_csv('test2.csv')

cols_to_check = ['Phone', 'phone', 'Phone.1']

df[cols_to_check] = df[cols_to_check].replace({'-':''}, regex=True)

df.to_csv('testnew.csv', mode = 'w', index=False)

This is an example of using the apply facility to apply functions with non-trivial logic to a column:

for col in cols_to_check:
    df[col] = df[col].apply(lambda x : x[1:] if x.startswith("1") else x)

See also this overview of apply .

I'd use applymap

Option 1
Use str.replace to just replace '-' with '' . I'm assuming that we can always take last 10 digits.

df[cols_to_check].applymap(lambda x: x.replace('-', '')[-10:])

    Phone       phone      Phone1
0  1234567890  1234567890  1234567890
1  1234567890  1234567890  1234567890
2  1234567890  1234567890  1234567890

Option 2
Use re.sub
However, if you want to strip all non-digit characters, use the regex module re and do something similar to that in option 1

import re

df[cols_to_check].applymap(lambda x: re.sub(r'\D', '', x)[-10:])

    Phone       phone      Phone1
0  1234567890  1234567890  1234567890
1  1234567890  1234567890  1234567890
2  1234567890  1234567890  1234567890

Option 3
We can also use pd.Series.str string accessor. But, we need to collapse into a series first.

df[cols_to_check].stack().str.replace('\D', '').str[-10:].unstack()

    Phone       phone      Phone1
0  1234567890  1234567890  1234567890
1  1234567890  1234567890  1234567890
2  1234567890  1234567890  1234567890

Setup

df = pd.DataFrame(dict(
    Phone=['1-123-456-7890', '123-4567890', '11234567890'],
    phone=['1-123-456-7890', '123-4567890', '11234567890'],
    Phone1=['1-123-456-7890', '123-4567890', '11234567890'],
    Other=[1, 2, 3]
))

cols_to_check = ['Phone', 'phone', 'Phone1']

df

   Other           Phone          Phone1           phone
0      1  1-123-456-7890  1-123-456-7890  1-123-456-7890
1      2     123-4567890     123-4567890     123-4567890
2      3     11234567890     11234567890     11234567890

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM