简体   繁体   中英

Pandas - Remove leading Zeros from String but not from Integers

I currently have a column in my dataset that looks like the following:

Identifier
09325445
02242456
00MatBrown
0AntonioK
065824245

The column data type is object. What I'd like to do is remove the leading zeros only from column rows where there is a string . I want to keep the leading zeros where the column rows are integers.

Result I'm looking to achieve:

Identifier
09325445
02242456
MatBrown
AntonioK
065824245

Code I am currently using (that isn't working)

def removeLeadingZeroFromString(row):
    if df['Identifier'] == str:
        return df['Identifier'].str.strip('0')
    else:
        return df['Identifier']
        
df['Identifier' ] = df.apply(lambda row: removeLeadingZeroFromString(row), axis=1)

One approach would be to try to convert Identifier to_numeric . Test where the converted values isna , using this mask to only str.lstrip (strip leading zeros only) where the values could not be converted:

m = pd.to_numeric(df['Identifier'], errors='coerce').isna()
df.loc[m, 'Identifier'] = df.loc[m, 'Identifier'].str.lstrip('0')

df :

  Identifier
0   09325445
1   02242456
2   MatBrown
3   AntonioK
4  065824245

Alternatively, a less robust approach, but one that will work with number only strings, would be to test where not str.isnumeric :

m = ~df['Identifier'].str.isnumeric()
df.loc[m, 'Identifier'] = df.loc[m, 'Identifier'].str.lstrip('0')

*NOTE This fails easily to_numeric is the much better approach if looking for all number types.

Sample Frame:

df = pd.DataFrame({
    'Identifier': ['0932544.5', '02242456']
})

Sample Results with isnumeric :

  Identifier
0   932544.5  # 0 Stripped
1   02242456

DataFrame and imports:

import pandas as pd

df = pd.DataFrame({
    'Identifier': ['09325445', '02242456', '00MatBrown', '0AntonioK',
                   '065824245']
})

Use replace with regex and a positive lookahead :

>>> df['Identifier'].str.replace(r'^0+(?=[a-zA-Z])', '', regex=True)
0     09325445
1     02242456
2     MatBrown
3     AntonioK
4    065824245
Name: Identifier, dtype: object

Regex: replace one or more 0 ( 0+ ) at the start of the string ( ^ ) if there is a character ( [a-zA-Z] ) after 0s ( (?=...) ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM