简体   繁体   English

Pandas - 从字符串中删除前导零而不是从整数中删除

[英]Pandas - Remove leading Zeros from String but not from Integers

I currently have a column in my dataset that looks like the following:我目前在我的数据集中有一列,如下所示:

Identifier标识符
09325445 09325445
02242456 02242456
00MatBrown 00马特布朗
0AntonioK 0安东尼奥
065824245 065824245

The column data type is object.列数据类型是对象。 What I'd like to do is remove the leading zeros only from column rows where there is a string .我想要做的是仅从有 string 的列行中删除前导零。 I want to keep the leading zeros where the column rows are integers.我想保留列行是整数的前导零。

Result I'm looking to achieve:我希望实现的结果:

Identifier标识符
09325445 09325445
02242456 02242456
MatBrown马特布朗
AntonioK安东尼奥·K
065824245 065824245

Code I am currently using (that isn't working)我目前使用的代码(不起作用)

def removeLeadingZeroFromString(row):
    if df['Identifier'] == str:
        return df['Identifier'].str.strip('0')
    else:
        return df['Identifier']
        
df['Identifier' ] = df.apply(lambda row: removeLeadingZeroFromString(row), axis=1)

One approach would be to try to convert Identifier to_numeric .一种方法是尝试将Identifier转换为to_numeric Test where the converted values isna , using this mask to only str.lstrip (strip leading zeros only) where the values could not be converted:测试转换值的位置isna ,使用此掩码仅str.lstrip无法转换值的str.lstrip (仅str.lstrip前导零):

m = pd.to_numeric(df['Identifier'], errors='coerce').isna()
df.loc[m, 'Identifier'] = df.loc[m, 'Identifier'].str.lstrip('0')

df : df

  Identifier
0   09325445
1   02242456
2   MatBrown
3   AntonioK
4  065824245

Alternatively, a less robust approach, but one that will work with number only strings, would be to test where not str.isnumeric :或者,一种不太可靠但仅适用于数字字符串的方法是测试 where not str.isnumeric

m = ~df['Identifier'].str.isnumeric()
df.loc[m, 'Identifier'] = df.loc[m, 'Identifier'].str.lstrip('0')

*NOTE This fails easily to_numeric is the much better approach if looking for all number types. *注意如果查找所有数字类型,这很容易失败to_numeric是更好的方法。

Sample Frame:示例框架:

df = pd.DataFrame({
    'Identifier': ['0932544.5', '02242456']
})

Sample Results with isnumeric :带有isnumeric示例结果:

  Identifier
0   932544.5  # 0 Stripped
1   02242456

DataFrame and imports:数据框和导入:

import pandas as pd

df = pd.DataFrame({
    'Identifier': ['09325445', '02242456', '00MatBrown', '0AntonioK',
                   '065824245']
})

Use replace with regex and a positive lookahead :使用正则表达式replace正向预测

>>> df['Identifier'].str.replace(r'^0+(?=[a-zA-Z])', '', regex=True)
0     09325445
1     02242456
2     MatBrown
3     AntonioK
4    065824245
Name: Identifier, dtype: object

Regex: replace one or more 0 ( 0+ ) at the start of the string ( ^ ) if there is a character ( [a-zA-Z] ) after 0s ( (?=...) ).正则表达式:如果在 0s ( (?=...) ) 之后有一个字符 ( [a-zA-Z] ),则在字符串 ( ^ ) 的开头替换一个或多个 0 ( 0+ )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM