[英]Remove leading zeroes pandas
For example I have such a data frame比如我有这样一个数据框
import pandas as pd
nums = {'amount': ['0324','S123','0010', None, '0030', 'SA40', 'SA24']}
df = pd.DataFrame(nums)
And I need to remove all leading zeroes and replace NONEs with zeros:我需要删除所有前导零并将 NONE 替换为零:
I did it with cycles but for large frames it works not fast enough.我是用循环来做的,但对于大框架来说,它的工作速度不够快。 I'd like to rewrite it using vectores
我想用 vectores 重写它
you can try str.replace
你可以试试
str.replace
df['amount'].str.replace(r'^(0+)', '').fillna('0')
0 324
1 S123
2 10
3 0
4 30
5 SA40
6 SA24
Name: amount, dtype: object
df['amount'] = df['amount'].str.lstrip('0').fillna(value='0')
I see already nice answer from @Epsi95 though, you even can try with character set with regex
我已经从@Epsi95 看到了很好的答案,你甚至可以尝试使用
regex
的字符集
>>> df['amount'].str.replace(r'^[0]*', '', regex=True).fillna('0')
0 324
1 S123
2 10
3 0
4 30
5 SA40
6 SA24
^[0]*
^ asserts position at start of a line
Match a single character present in the list below [0]
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
Step by step:一步步:
Remove all leading zeros:删除所有前导零:
Use str.lstrip
which returns a copy of the string with leading characters removed (based on the string argument passed).使用
str.lstrip
返回删除前导字符的字符串副本(基于传递的字符串参数)。
Here,这里,
df['amount'] = df['amount'].str.lstrip('0')
For more, ( https://www.programiz.com/python-programming/methods/string/lstrip )有关更多信息,( https://www.programiz.com/python-programming/methods/string/lstrip )
Replace None with zeros:用零替换无:
Use fill.na
which works with others than None
as well使用
fill.na
也可以与None
以外的其他人一起使用
Here,这里,
df['amount'].fillna(value='0')
And for more: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html更多信息: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html
Result in one line:一行结果:
df['amount'] = df['amount'].str.lstrip('0').fillna(value='0')
0
or the last 0
is not removed, you can use:0
或最后一个0
不被删除,您可以使用:df['amount'] = df['amount'].str.replace(r'^(0+)(?!$)', '', regex=True).fillna('0')
Regex (?!$)
ensure the matching substring (leading zeroes) does not including the last 0
.正则表达式
(?!$)
确保匹配的 substring(前导零)不包括最后一个0
。 Thus, effectively keeping the last 0
.因此,有效地保留了最后一个
0
。
Input Data输入数据
nums = {'amount': ['0324','S123','0010', None, '0030', 'SA40', 'SA24', '0', '000']}
df = pd.DataFrame(nums)
amount
0 0324
1 S123
2 0010
3 None
4 0030
5 SA40
6 SA24
7 0 <== Added a single 0 here
8 000 <== Added a sequence of all 0's here
Output Output
print(df)
amount
0 324
1 S123
2 10
3 0
4 30
5 SA40
6 SA24
7 0 <== Single 0 is not removed
8 0 <== Last 0 is kept
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.