[英]Pandas - Remove leading Zeros from String but not from Integers
I currently have a column in my dataset that looks like the following:我目前在我的数据集中有一列,如下所示:
Identifier标识符 |
---|
09325445 09325445 |
02242456 02242456 |
00MatBrown 00马特布朗 |
0AntonioK 0安东尼奥 |
065824245 065824245 |
The column data type is object.列数据类型是对象。 What I'd like to do is remove the leading zeros only from column rows where there is a string .我想要做的是仅从有 string 的列行中删除前导零。 I want to keep the leading zeros where the column rows are integers.我想保留列行是整数的前导零。
Result I'm looking to achieve:我希望实现的结果:
Identifier标识符 |
---|
09325445 09325445 |
02242456 02242456 |
MatBrown马特布朗 |
AntonioK安东尼奥·K |
065824245 065824245 |
Code I am currently using (that isn't working)我目前使用的代码(不起作用)
def removeLeadingZeroFromString(row):
if df['Identifier'] == str:
return df['Identifier'].str.strip('0')
else:
return df['Identifier']
df['Identifier' ] = df.apply(lambda row: removeLeadingZeroFromString(row), axis=1)
One approach would be to try to convert Identifier
to_numeric
.一种方法是尝试将Identifier
转换为to_numeric
。 Test where the converted values isna
, using this mask to only str.lstrip
(strip leading zeros only) where the values could not be converted:测试转换值的位置isna
,使用此掩码仅str.lstrip
无法转换值的str.lstrip
(仅str.lstrip
前导零):
m = pd.to_numeric(df['Identifier'], errors='coerce').isna()
df.loc[m, 'Identifier'] = df.loc[m, 'Identifier'].str.lstrip('0')
df
: df
:
Identifier
0 09325445
1 02242456
2 MatBrown
3 AntonioK
4 065824245
Alternatively, a less robust approach, but one that will work with number only strings, would be to test where not str.isnumeric
:或者,一种不太可靠但仅适用于数字字符串的方法是测试 where not str.isnumeric
:
m = ~df['Identifier'].str.isnumeric()
df.loc[m, 'Identifier'] = df.loc[m, 'Identifier'].str.lstrip('0')
*NOTE This fails easily to_numeric
is the much better approach if looking for all number types. *注意如果查找所有数字类型,这很容易失败to_numeric
是更好的方法。
Sample Frame:示例框架:
df = pd.DataFrame({
'Identifier': ['0932544.5', '02242456']
})
Sample Results with isnumeric
:带有isnumeric
示例结果:
Identifier
0 932544.5 # 0 Stripped
1 02242456
DataFrame and imports:数据框和导入:
import pandas as pd
df = pd.DataFrame({
'Identifier': ['09325445', '02242456', '00MatBrown', '0AntonioK',
'065824245']
})
Use replace
with regex and a positive lookahead :使用正则表达式replace
和正向预测:
>>> df['Identifier'].str.replace(r'^0+(?=[a-zA-Z])', '', regex=True)
0 09325445
1 02242456
2 MatBrown
3 AntonioK
4 065824245
Name: Identifier, dtype: object
Regex: replace one or more 0 ( 0+
) at the start of the string ( ^
) if there is a character ( [a-zA-Z]
) after 0s ( (?=...)
).正则表达式:如果在 0s ( (?=...)
) 之后有一个字符 ( [a-zA-Z]
),则在字符串 ( ^
) 的开头替换一个或多个 0 ( 0+
)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.