I currently have a column in my dataset that looks like the following:
Identifier |
---|
09325445 |
02242456 |
00MatBrown |
0AntonioK |
065824245 |
The column data type is object. What I'd like to do is remove the leading zeros only from column rows where there is a string . I want to keep the leading zeros where the column rows are integers.
Result I'm looking to achieve:
Identifier |
---|
09325445 |
02242456 |
MatBrown |
AntonioK |
065824245 |
Code I am currently using (that isn't working)
def removeLeadingZeroFromString(row):
if df['Identifier'] == str:
return df['Identifier'].str.strip('0')
else:
return df['Identifier']
df['Identifier' ] = df.apply(lambda row: removeLeadingZeroFromString(row), axis=1)
One approach would be to try to convert Identifier
to_numeric
. Test where the converted values isna
, using this mask to only str.lstrip
(strip leading zeros only) where the values could not be converted:
m = pd.to_numeric(df['Identifier'], errors='coerce').isna()
df.loc[m, 'Identifier'] = df.loc[m, 'Identifier'].str.lstrip('0')
df
:
Identifier
0 09325445
1 02242456
2 MatBrown
3 AntonioK
4 065824245
Alternatively, a less robust approach, but one that will work with number only strings, would be to test where not str.isnumeric
:
m = ~df['Identifier'].str.isnumeric()
df.loc[m, 'Identifier'] = df.loc[m, 'Identifier'].str.lstrip('0')
*NOTE This fails easily to_numeric
is the much better approach if looking for all number types.
Sample Frame:
df = pd.DataFrame({
'Identifier': ['0932544.5', '02242456']
})
Sample Results with isnumeric
:
Identifier
0 932544.5 # 0 Stripped
1 02242456
DataFrame and imports:
import pandas as pd
df = pd.DataFrame({
'Identifier': ['09325445', '02242456', '00MatBrown', '0AntonioK',
'065824245']
})
Use replace
with regex and a positive lookahead :
>>> df['Identifier'].str.replace(r'^0+(?=[a-zA-Z])', '', regex=True)
0 09325445
1 02242456
2 MatBrown
3 AntonioK
4 065824245
Name: Identifier, dtype: object
Regex: replace one or more 0 ( 0+
) at the start of the string ( ^
) if there is a character ( [a-zA-Z]
) after 0s ( (?=...)
).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.